Update Algorithm.md

Add testcases to force big values for dual/slack
Separate function top_level_blossom()
2024-07-28 11:38:05 +02:00 · 2024-07-28 11:38:05 +02:00 · 2024-07-21 15:32:41 +02:00 · 2024-07-21 15:32:41 +02:00 · 2024-07-21 15:32:41 +02:00 · 2024-07-11 21:28:41 +02:00
21 changed files with 2997 additions and 1036 deletions
--- a/README.md
+++ b/README.md
@ -1,39 +1,72 @@
 # Maximum Weighted Matching

-This repository contains a Python 3 implementation of maximum weighted matching in general graphs.
+This repository contains implementations of maximum weighted matching for general graphs.

 In graph theory, a _matching_ is a subset of edges that does not use any vertex more than once.
 For an edge-weighted graph, a _maximum weight matching_ is a matching that achieves
 the largest possible sum of weights of matched edges.

 The code in this repository is based on a variant of the blossom algorithm that runs in
-_O(n<sup>3</sup>)_ steps.
+_O(n m log n)_ steps.
 See the file [Algorithm.md](doc/Algorithm.md) for a detailed description.

-You may find this repository useful if ...

- - you want a stand-alone, pure Python module that calculates maximum weight matchings
-   with reasonable efficiency;
- - or you want to play around with a maximum weight matching algorithm to learn how it works.
+## Python

-This repository is probably not the best place if ...
+The folder [python/](python/) contains a Python 3 package `mwmatching` that implements
+maximum weighted matching.
+The Python code is self-contained -- it has no dependencies outside the standard library.

- - you need a very fast routine that quickly matches large graphs (more than ~ 1000 vertices).
-   In that case I recommend [LEMON](http://lemon.cs.elte.hu/trac/lemon),
-   a C++ library that provides a blazingly fast implementation.
- - or you want a high quality Python package that provides abstract graph data structures
-   and graph algorithms.
-   In that case I recommend [NetworkX](https://networkx.org/).
- - or you are only interested in bipartite graphs.
-   There are simpler and faster algorithms for matching bipartite graphs.
+To use the package, set your `PYTHONPATH` to the location of the package `mwmatching`.
+Alternatively, you can install the package into your Python environment as follows:
+
+```bash
+cd python
+pip install .
+```
+
+Using the package is easy.
+You describe the input graph by listing its edges.
+Each edge is represented as a tuple of vertex indices and the weight of the edge.
+The example below finds a matching in a graph with 5 vertices and 5 edges.
+The maximum weight matching contains two edges and has total weight 11.
+
+```python
+from mwmatching import maximum_weight_matching
+edges = [(0, 1, 3), (1, 2, 8), (1, 4, 6), (2, 3, 5), (2, 4, 7)]
+matching = maximum_weight_matching(edges)
+print(matching)  # prints [(1, 4), (2, 3)]
+```
+
+## C++
+
+The folder [cpp/](cpp/) contains a header-only C++ implementation of maximum weighted matching.
+
+**NOTE:**
+The C++ code currently implements a slower algorithm that runs in _O(n<sup>3</sup>)_ steps.
+I plan to eventually update the C++ code to implement the faster _O(n m log n)_ algorithm.
+
+The C++ code is self-contained and can easily be linked into an application.
+It is also reasonably efficient.
+
+For serious use cases, [LEMON](https://lemon.cs.elte.hu/trac/lemon) may be a better choice.
+LEMON is a C++ library that provides a very fast and robust implementation of
+maximum weighted matching and many other graph algorithms.
+To my knowledge, it is the only free software library that provides a high-quality
+matching algorithm.


 ## Repository structure

 ```
    python/
-        mwmatching.py          : Python implementation of maximum weight matching
-        test_mwmatching.py     : Unit tests
+        mwmatching/            : Python package for maximum weight matching
+            __init__.py
+            algorithm.py       : Algorithm implementation
+            datastruct.py      : Internal data structures
+        tests/
+            test_algorithm     : Unit tests for the algorithm
+            test_datastruct.py : Unit tests for data structures
        run_matching.py        : Command-line program to run the matching algorithm 

    cpp/
@ -76,6 +109,11 @@ My implementation follows the description of the algorithm as it appears in the
 However, earlier versions of the algorithm were invented and improved by several other scientists.
 See the file [Algorithm.md](doc/Algorithm.md) for links to the most important papers.

+I used some ideas from the source code of the `MaxWeightedMatching` class in
+[LEMON](https://lemon.cs.elto.hu/trac/lemon):
+the technique to implement lazy updates of vertex dual variables,
+and the approach to re-use alternating trees after augmenting the matching.
+
 I used Fortran programs `hardcard.f`, `t.f` and `tt.f` by R. B. Mattingly and N. Ritchey
 to generate test graphs.
 These programs are part of the DIMACS Network Flows and Matching implementation challenge.
@ -83,7 +121,7 @@ They can be found in the
 [DIMACS Netflow archive](http://archive.dimacs.rutgers.edu/pub/netflow/).

 To check the correctness of my results, I used other maximum weight matching solvers:
-the `MaxWeightedMatching` module in [LEMON](http://lemon.cs.elte.hu/trac/lemon),
+the `MaxWeightedMatching` module in LEMON,
 and the program
 [`wmatch`](http://archive.dimacs.rutgers.edu/pub/netflow/matching/weighted/solver-1/)
 by Edward Rothberg.
@ -94,7 +132,7 @@ by Edward Rothberg.
 The following license applies to the software in this repository, excluding the folder `doc`.
 This license is sometimes called the MIT License or the Expat License:

-> Copyright (c) 2023 Joris van Rantwijk
+> Copyright (c) 2023-2024 Joris van Rantwijk
 >
 > Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 >
--- a/doc/Algorithm.md
+++ b/doc/Algorithm.md
@ -4,19 +4,20 @@
 ## Introduction

 This document describes the implementation of an algorithm that computes a maximum weight matching
-in a general graph in time _O(n<sup>3</sup>)_, where _n_ is the number of vertices in the graph.
+in a general graph in time _O(n (n + m) log n)_, where _n_ is the number of vertices in
+the graph and _m_ is the number of edges.

 In graph theory, a _matching_ is a subset of edges such that none of them share a common vertex.

 A _maximum cardinality matching_ is a matching that contains the largest possible number of edges
 (or equivalently, the largest possible number of vertices).

-For a graph that has weights attached to its edges, a _maximum weight matching_
+If a graph has weights assigned to its edges, a _maximum weight matching_
 is a matching that achieves the largest possible sum of weights of matched edges.
 An algorithm for maximum weight matching can obviously also be used to calculate a maximum
 cardinality matching by simply assigning weight 1 to all edges.

-Certain computer science problems can be understood as _restrictions_ of maximum weighted matching
+Certain related problems can be understood as _restrictions_ of maximum weighted matching
 in general graphs.
 Examples are: maximum matching in bipartite graphs, maximum cardinality matching in general graphs,
 and maximum weighted matching in general graphs with edge weights limited to integers
@ -48,7 +49,7 @@ It is based on the ideas of Edmonds, but uses different data structures to reduc
 of work.

 In 1983, Galil, Micali and Gabow published a maximum weighted matching algorithm that runs in
-time _O(n\*m\*log(n))_ [[4]](#galil_micali_gabow1986) .
+time _O(n m log n)_ [[4]](#galil_micali_gabow1986) .
 It is an implementation of Edmonds' blossom algorithm that uses advanced data structures
 to speed up critical parts of the algorithm.
 This algorithm is asymptotically faster than _O(n<sup>3</sup>)_ for sparse graphs,
@ -63,45 +64,32 @@ of the literature.
 The paper describes a maximum weighted matching algorithm that is similar to Edmonds'
 blossom algorithm, but carefully implemented to run in time _O(n<sup>3</sup>)_.
 It then sketches how advanced data structures can be added to arrive at the Galil-Micali-Gabow
-algorithm that runs in time _O(n\*m\*log(n))_.
+algorithm that runs in time _O(n m log n)_.

 In 1990, Gabow published a maximum weighted matching algorithm that runs in time
-_O(n\*m + n<sup>2</sup>\*log(n))_ [[6]](#gabow1990) .
+_O(n m + n<sup>2</sup> log n)_ [[6]](#gabow1990) .
 It uses several advanced data structures, including Fibonacci heaps.
 Unfortunately I don't understand this algorithm at all.


 ## Choosing an algorithm

-I selected the _O(n<sup>3</sup>)_ variant of Edmonds' blossom algorithm as described by
-Galil [[5]](#galil1986) .
-This algorithm is usually credited to Gabow [[3]](#gabow1974), but I find the description
-in [[5]](#galil1986) easier to understand.
+I selected the _O(n m log n)_ algorithm by Galil, Micali and Gabow
+[[4]](#galil_micali_gabow1986).

-This is generally a fine algorithm.
-One of its strengths is that it is relatively easy to understand, especially compared
-to the more recent algorithms.
-Its run time is asymptotically optimal for complete graphs (graphs that have an edge
-between every pair of vertices).
-
-On the other hand, the algorithm is suboptimal for sparse graphs.
-It is possible to construct highly sparse graphs, having _m = O(n)_,
-that cause this algorithm to perform _&Theta;(n<sup>3</sup>)_ steps.
-In such cases the Galil-Micali-Gabow algorithm would be significantly faster.
-
-Then again, there is a question of how important the worst case is for practical applications.
-I suspect that the simple algorithm typically needs only about _O(n\*m)_ steps when running
-on random sparse graphs with random weights, i.e. much faster than its worst case bound.
-
-After trading off these properties, I decided that I prefer an algorithm that is understandable
-and has decent performance, over an algorithm that is faster in specific cases but also
-significantly more complicated.
+This algorithm is asymptotically optimal for sparse graphs.
+It has also been shown to be quite fast in practice on several types of graphs
+including random graphs [[7]](#mehlhorn_schafer2002).

+This algorithm is more difficult to implement than the older _O(n<sup>3</sup>)_ algorithm.
+In particular, it requires a specialized data structure to implement concatenable priority queues.
+This increases the size and complexity of the code quite a bit.
+However, in my opinion the performance improvement is worth the extra effort.

 ## Description of the algorithm

-My implementation closely follows the description by Zvi Galil in [[5]](#galil1986) .
-I recommend to read that paper before diving into my description below.
+My implementation roughly follows the description by Zvi Galil in [[5]](#galil1986) .
+I recommend reading that paper before diving into my description below.
 The paper explains the algorithm in depth and shows how it relates to matching
 in bipartite graphs and non-weighted graphs.

@ -253,14 +241,14 @@ have _slack_.
 An augmenting path that consists only of tight edges is _guaranteed_ to increase the weight
 of the matching as much as possible.

-While searching for an augmenting path, we simply restrict the search to tight edges,
+While searching for an augmenting path, we restrict the search to tight edges,
 ignoring all edges that have slack.
 Certain explicit actions of the algorithm cause edges to become tight or slack.
 How this works will be explained later.

 To find an augmenting path, the algorithm searches for alternating paths that start
 in an unmatched vertex.
-The collection of alternating paths forms a forest of trees.
+The collection of such alternating paths forms a forest of trees.
 Each tree is rooted in an unmatched vertex, and all paths from the root to the leaves of a tree
 are alternating paths.
 The nodes in these trees are top-level blossoms.
@ -289,8 +277,8 @@ an odd-length alternating cycle.
 The lowest common ancestor node in the alternating tree forms the beginning and end
 of the alternating cycle.
 In this case a new blossom must be created by shrinking the cycle.
-If the two S-blossoms are in different alternating trees, the edge that links the blossoms
-is part of an augmenting path between the roots of the two trees.
+On the other hand, if the two S-blossoms are in different alternating trees,
+the edge that links the blossoms is part of an augmenting path between the roots of the two trees.

 ![Figure 3](figures/graph3.png) <br>
 *Figure 3: Growing alternating trees*
@ -314,7 +302,7 @@ The search procedure considers these vertices one-by-one and tries to use them t
 either grow the alternating tree (thus adding new vertices to the queue),
 or discover an augmenting path or a new blossom.

-In detail, the search for an augmenting path proceeds as follows:
+The search for an augmenting path proceeds as follows:

 - Mark all top-level blossoms as _unlabeled_.
 - Initialize an empty queue _Q_.
@ -469,9 +457,9 @@ $$ \pi_{x,y} = u_x + u_y + \sum_{(x,y) \in B} z_B - w_{x,y} $$

 An edge is _tight_ if and only if its slack is zero.
 Given the values of the dual variables, it is very easy to calculate the slack of an edge
-which is not contained in any blossom: simply add the duals of its incident vertices and
+which is not contained in any blossom: add the duals of its incident vertices and
 subtract the weight.
-To check whether an edge is tight, simply compute its slack and check whether it is zero.
+To check whether an edge is tight, simply compute its slack and compare it to zero.

 Calculating the slack of an edge that is contained in one or more blossoms is a little tricky,
 but fortunately we don't need such calculations.
@ -504,7 +492,7 @@ At that point the maximum weight matching has been found.
 When the matching algorithm is finished, the constraints can be checked to verify
 that the matching is optimal.
 This check is simpler and faster than the matching algorithm itself.
-It can therefore be a useful way to guard against bugs in the matching algorithm.
+It can therefore be a useful way to guard against bugs in the algorithm.

 ### Rules for updating dual variables

@ -534,11 +522,12 @@ It then changes dual variables as follows:
 - _z<sub>B</sub> &larr; z<sub>B</sub> &#x2212; 2 * &delta;_ for every non-trivial T-blossom _B_

 Dual variables of unlabeled blossoms and their vertices remain unchanged.
-Dual variables _z<sub>B</sub>_ of non-trivial sub-blossoms also remain changed;
+Dual variables _z<sub>B</sub>_ of non-trivial sub-blossoms also remain unchanged;
 only top-level blossoms have their _z<sub>B</sub>_ updated.

-Note that this update does not change the slack of edges that are either matched,
-or linked in the alternating tree, or contained in a blossom.
+Note that these rules ensure that no change occurs to the slack of any edge which is matched,
+or part of an alternating tree, or contained in a blossom.
+Such edges are tight and remain tight through the update.
 However, the update reduces the slack of edges between S blossoms and edges between S-blossoms
 and unlabeled blossoms.
 It may cause some of these edges to become tight, allowing them to be used
@ -569,25 +558,62 @@ If the dual update is limited by _&delta;<sub>4</sub>_, it causes the dual varia
 a T-blossom to become zero.
 The next step is to expand that blossom.

-A dual update may find that _&delta; = 0_, implying that the dual variables don't change.
+### Discovering tight edges through delta steps
+
+A delta step may find that _&delta; = 0_, implying that the dual variables don't change.
 This can still be useful since all types of updates have side effects (adding an edge
 to an alternating tree, or expanding a blossom) that allow the algorithm to make progress.
+In fact, it is convenient to let the dual update mechanism drive the entire process of discovering
+tight edges and growing alternating trees.

-During a single _stage_, the algorithm may iterate several times between scanning tight edges and
-updating dual variables.
-These iterations are called _substages_.
-To clarify: A stage is the process of growing alternating trees until an augmenting path is found.
+In my description of the search algorithm above, I stated that upon discovery of a tight edge
+between a newly labeled S-vertex and an unlabeled vertex or a different S-blossom, the edge should
+be used to grow the alternating tree or to create a new blossom or to form an augmenting path.
+However, it turns out to be easier to postpone the use of such edges until the next delta step.
+While scanning newly labeled S-vertices, edges to unlabeled vertices or different S-blossoms
+are discovered but not yet used.
+Such edges are merely registered in a suitable data structure.
+Even if the edge is tight, it is registered rather than used right away.
+
+Once the scan completes, a delta step will be done.
+If any tight edges were discovered during the scan, the delta step will find that either
+_&delta;<sub>2</sub> = 0_ or _&delta;<sub>3</sub> = 0_.
+The corresponding step (growing the alternating tree, creating a blossom or augmenting
+the matching) will occur at that point.
+If no suitable tight edges exist, a real (non-zero) change of dual variables will occur.
+
+The search for an augmenting path becomes as follows:
+
+ - Mark all top-level blossoms as _unlabeled_.
+ - Initialize an empty queue _Q_.
+ - Assign label S to all top-level blossoms that contain an unmatched vertex.
+   Add all vertices inside such blossoms to _Q_.
+ - Repeat until either an augmenting path is found or _&delta;<sub>1</sub> = 0_:
+     - Scan all vertices in Q as described earlier.
+       Register edges to unlabeled vertices or other S-blossoms.
+       Do not yet use such edges to change the alternating tree, even if the edge is tight.
+     - Calculate _&delta;_ and update dual variables as described above.
+     - If _&delta; = &delta;<sub>1</sub>_, end the search.
+       The maximum weight matching has been found.
+     - If _&delta; = &delta;<sub>2</sub>_, use the corresponding edge to grow the alternating tree.
+       Assign label T to the unlabeled blossom.
+       Then assign label S to its mate and add the new S-vertices to _Q_.
+     - If _&delta; = &delta;<sub>3</sub>_ and the corresponding edge connects two S-blossoms
+       in the same alternating tree, use the edge to create a new blossom.
+       Add the new S-vertices to _Q_.
+     - If _&delta; = &delta;<sub>3</sub>_ and the corresponding edge connects two S-blossoms
+       in different alternating trees, use the edge to construct an augmenting path.
+       End the search and return the augmenting path.
+     - If _&delta; = &delta;<sub>4</sub>_, expand the corresponding T-blossom.
+
+It may seem complicated, but this is actually easier.
+The code that scans newly labeled S-vertices, no longer needs special treatment of tight edges.
+
+In general, multiple updates of the dual variables are necessary during a single _stage_ of
+the algorithm.
+Remember that a stage is the process of growing alternating trees until an augmenting path is found.
 A stage ends either by augmenting the matching, or by concluding that no further improvement
 is possible.
-Each stage consists of one or more substages.
-A substage scans tight edges to grow the alternating trees.
-When a substage runs out of tight edges, it ends by performing a dual variable update.
-A substage also ends immediately when it finds an augmenting path.
-At the end of each stage, labels and alternating trees are removed.
-
-The matching algorithm ends when a substage ends in a dual variable update limited
-by _&delta;<sub>1</sub>_.
-At that point the matching has maximum weight.

 ### Expanding a blossom

@ -635,161 +661,322 @@ All vertices of sub-blossoms that got label S are inserted into _Q_.
 ![Figure 5](figures/graph5.png) <br>
 *Figure 5: Expanding a T-blossom*

-### Keeping track of least-slack edges
+### Keeping track of the top-level blossom of each vertex

-To perform a dual variable update, the algorithm needs to compute the values
+The algorithm often needs to find the top-level blossom _B(x)_ that contains a given vertex _x_.
+
+A naive implementation may keep this information in an array where the element with
+index _x_ holds a pointer to blossom _B(x)_.
+Lookup in this array would be fast, but keeping the array up-to-date takes too much time.
+There can be _O(n)_ stages, and _O(n)_ blossoms can be created or expanded during a stage,
+and a blossom can contain _O(n)_ vertices,
+therefore the total number of updates to the array could add up to _O(n<sup>3</sup>)_.
+
+To solve this, we use a special data structure: a _concatenable priority queue_.
+Each top-level blossom maintains an instance of this type of queue, containing its vertices.
+Each vertex is a member in precisely one such queue.
+
+To find the top-level blossom _B(x)_ that contains a given vertex _x_, we determine
+the queue instance in which the vertex is a member.
+This takes time _O(log n)_.
+The queue instance corresponds directly to a specific blossom, which we can find
+for example by storing a pointer to the blossom inside the queue instance.
+
+When a new blossom is created, the concatenable queues of its sub-blossoms are merged
+to form one concatenable queue for the new blossom.
+The merged queue contains all vertices of the original queues.
+Merging a pair of queues takes time _O(log n)_.
+To merge the queues of _k_ sub-blossoms, the concatenation step is repeated _k-1_ times,
+taking total time _O(k log n)_.
+
+When a blossom is expanded, its concatenable queue is un-concatenated to recover separate queues
+for the sub-blossoms.
+This also takes time _O(log n)_ for each sub-blossom.
+
+Implementation details of concatenable queues are discussed later in this document.
+
+### Lazy updating of dual variables
+
+During a delta step, the dual variables of labeled vertices and blossoms change as described above.
+Updating these variables directly would take time _O(n)_ per delta step.
+The total number of delta steps during a matching may be _&Theta;(n<sup>2</sup>)_,
+pushing the total time to update dual variables to _O(n<sup>3</sup>)_ which is too slow.
+
+To solve this, [[4]](#galil_micali_gabow1986) describes a technique that stores dual values
+in a _modified_ form which is invariant under delta steps.
+The modified values can be converted back to the true dual values when necessary.
+[[7]](#mehlhorn_schafer2002) describes a slightly different technique which I find easier
+to understand.
+My implementation is very similar to theirs.
+
+The first trick is to keep track of the running sum of _&delta;_ values since the beginning of the algorithm.
+Let's call that number _&Delta;_.
+At the start of the algorithm _&Delta; = 0_, but the value increases as the algorithm goes through delta steps.
+
+For each non-trivial blossom, rather than storing its true dual value, we store a _modified_ dual value:
+
+ - For an S-blossom, the modified dual value is _z'<sub>B</sub> = z<sub>B</sub> - 2 &Delta;_
+ - For a T-blossom, the modified dual value is _z'<sub>B</sub> = z<sub>B</sub> + 2 &Delta;_
+ - For an unlabeled blossom or non-top-level blossom, the modified dual value is equal
+   to the true dual value.
+
+These modified values are invariant under delta steps.
+Thus, there is no need to update the stored values during a delta step.
+
+Since the modified blossom dual value depends on the label (S or T) of the blossom,
+the modified value must be updated whenever the label of the blossom changes.
+This update can be done in constant time, and changing the label of a blossom is
+in any case an explicit step, so this won't increase the asymptotic run time.
+
+For each vertex, rather than storing its true dual value, we store a _modified_ dual value:
+
+ - For an S-vertex, the modified dual value is _u'<sub>x</sub> = u<sub>x</sub> + &Delta;_
+ - For a T-vertex, the modified dual value is _u'<sub>x</sub> = u<sub>x</sub> - offset<sub>B(x)</sub> - &Delta;_
+ - For an unlabeled vertex, the modified dual value is _u'<sub>x</sub> = u<sub>x</sub> - offset<sub>B(x)</sub>_
+
+where _offset<sub>B</sub>_ is an extra variable which is maintained for each top-level blossom.
+
+Again, the modified values are invariant under delta steps, which implies that no update
+to the stored values is necessary during a delta step.
+
+Since the modified vertex dual value depends on the label (S or T) of its top-level blossom,
+an update is necessary when that label changes.
+For S-vertices, we can afford to apply that update directly to the vertices involved.
+This is possible since a vertex becomes an S-vertex at most once per stage.
+
+The situation is more complicated for T-vertices.
+During a stage, a T-vertex can become unlabeled if it is part of a T-blossom that gets expanded.
+The same vertex can again become a T-vertex, then again become unlabeled during a subsequent
+blossom expansion.
+In this way, a vertex can transition between T-vertex and unlabeled vertex up to _O(n)_ times
+within a stage.
+We can not afford to update the stored modified vertex dual so many times.
+This is where the _offset_ variables come in.
+If a blossom becomes a T-blossom, rather than updating the modified duals of all vertices,
+we update only the _offset_ variable of the blossom such that the modified vertex duals
+remain unchanged.
+If a blossom is expanded, we push the _offset_ values down to its sub-blossoms.
+
+### Efficiently computing _&delta;_
+
+To perform a delta step, the algorithm computes the values
 of _&delta;<sub>1</sub>_, _&delta;<sub>2</sub>_, _&delta;<sub>3</sub>_ and _&delta;<sub>4</sub>_
-and determine which edge (_&delta;<sub>2</sub>_, _&delta;<sub>3</sub>_) or
+and determines which edge (_&delta;<sub>2</sub>_, _&delta;<sub>3</sub>_) or
 blossom (_&delta;<sub>4</sub>_) achieves the minimum value.

-The total number of dual updates during a matching may be _&Theta;(n<sup>2</sup>)_.
-Since we want to limit the total number of steps of the matching algorithm to _O(n<sup>3</sup>)_,
-each dual update may use at most _O(n)_ steps.
+A naive implementation might compute _&delta;_ by looping over the vertices, blossoms and edges
+in the graph.
+The total number of delta steps during a matching may be _&Theta;(n<sup>2</sup>)_,
+pushing the total time for _&delta;_ calculations to _O(n<sup>2</sup> m)_ which is much too slow.
+[[4]](#galil_micali_gabow1986) introduces a combination of data structures from which
+the value of _&delta;_ can be computed efficiently.

-We can find _&delta;<sub>1</sub>_ in _O(n)_ steps by simply looping over all vertices
-and checking their dual variables.
-We can find _&delta;<sub>4</sub>_ in _O(n)_ steps by simply looping over all non-trivial blossoms
-(since there are fewer than _n_ non-trivial blossoms).
-We could find _&delta;<sub>2</sub>_ and _&delta;<sub>3</sub>_ by simply looping over
-all edges of the graph in _O(m)_ steps, but that exceeds our budget of _O(n)_ steps.
-So we need better techniques.
+_&delta;<sub>1</sub>_ is the minimum dual value of any S-vertex.
+This value can be computed in constant time.
+The dual value of an unmatched vertex is reduced by _&delta;_ during every delta step.
+Since all vertex duals start with the same dual value _u<sub>start</sub>_,
+all unmatched vertices have dual value _&delta;<sub>1</sub> = u<sub>start</sub> - &Delta;_.
 
-For _&delta;<sub>2</sub>_, we determine the least-slack edge between an S-blossom and unlabeled
-blossom as follows.
-For every vertex _y_ in any unlabeled blossom, keep track of _e<sub>y</sub>_:
-the least-slack edge that connects _y_ to any S-vertex.
-The thing to keep track of is the identity of the edge, not the slack value.
-This information is kept up-to-date as part of the procedure that considers S-vertices.
-The scanning procedure eventually considers all edges _(x, y, w)_ where _x_ is an S-vertex.
-At that moment _e<sub>y</sub>_ is updated if the new edge has lower slack.
+_&delta;<sub>3</sub>_ is half of the minimum slack of any edge between two different S-blossoms.
+To compute this efficiently, we keep edges between S-blossoms in a priority queue.
+The edges are inserted into the queue during scanning of newly labeled S-vertices.
+To compute _&delta;<sub>3</sub>_, we simply find the minimum-priority element of the queue.

-Calculating _&delta;<sub>2</sub>_ then becomes a matter of looping over all vertices _x_,
-checking whether _B(x)_ is unlabeled and calculating the slack of _e<sub>x</sub>_.
+A complication may occur when a new blossom is created.
+Edges that connect different top-level S-blossoms before creation of the new blossom,
+may end up as internal edges inside the newly created blossom.
+This implies that such edges would have to be removed from the _&delta;<sub>3</sub>_ priority queue,
+but that would be quite difficult.
+Instead, we just let those edges stay in the queue.
+When computing the value of _&delta;<sub>3</sub>_, we thus have to check whether the minimum
+element represents an edge between different top-level blossoms.
+If not, we discard such stale elements until we find an edge that does.

-One subtle aspect of this technique is that a T-vertex can loose its label when
-the containing T-blossom gets expanded.
-At that point, we suddenly need to have kept track of its least-slack edge to any S-vertex.
-It is therefore necessary to do the same tracking also for T-vertices.
-So the technique is: For any vertex that is not an S-vertex, track its least-slack edge
-to any S-vertex.
+A complication occurs when dual variables are updated.
+At that point, the slack of any edge between different S-blossoms decreases by _2\*&delta;_.
+But we can not afford to update the priorities of all elements in the queue.
+To solve this, we set the priority of each edge to its _modified slack_.

-Another subtle aspect is that a T-vertex may have a zero-slack edge to an S-vertex.
-Even though these edges are already tight, they must still be tracked.
-If the T-vertex loses its label, this edge needs to be reconsidered by the scanning procedure.
-By including these edges in least-slack edge tracking, they will be rediscovered
-through a _&delta;<sub>2</sub>=0_ update after the vertex becomes unlabeled.
+The _modified slack_ of an edge is defined as follows:

-For _&delta;<sub>3</sub>_, we determine the least-slack edge between any pair of S-blossoms
-as follows.
-For every S-blossom _B_, keep track of _e<sub>B</sub>_:
-the least-slack edge between _B_ and any other S-blossom.
-Note that this is done per S-blossoms, not per S-vertex.
-This information is kept up-to-date as part of the procedure that considers S-vertices.
-Calculating _&delta;<sub>3</sub>_ then becomes a matter of looping over all S-blossoms _B_
-and calculating the slack of _e<sub>B</sub>_.
+$$ \pi'_{x,y} = u'_x + u'_y - w_{x,y} $$

-A complication occurs when S-blossoms are merged.
-Some of the least-slack edges of the sub-blossoms may be internal edges in the merged blossom,
-and therefore irrelevant for _&delta;<sub>3</sub>_.
-As a result, the proper _e<sub>B</sub>_ of the merged blossom may be different from all
-least-slack edges of its sub-blossoms.
-An additional data structure is needed to find _e<sub>B</sub>_ of the merged blossom.
+The modified slack is computed in the same way as true slack, except it uses
+the modified vertex duals instead of true vertex duals.
+Blossom duals are ignored since we will never compute the modified slack of an edge that
+is contained inside a blossom.
+Because modified vertex duals are invariant under delta steps, so is the modified edge slack.
+As a result, the priorities of edges in the priority queue remain unchanged during a delta step.

-For every S-blossom _B_, maintain a list _L<sub>B</sub>_ of edges between _B_ and
-other S-blossoms.
-The purpose of _L<sub>B</sub>_ is to contain, for every other S-blossom, the least-slack edge
-between _B_ and that blossom.
-These lists are kept up-to-date as part of the procedure that considers S-vertices.
-While considering vertex _x_, if edge _(x, y, w)_ has positive slack,
-and _B(y)_ is an S-blossom, the edge is inserted in _L<sub>B(x)</sub>_.
-This may cause _L<sub>B(x)</sub>_ to contain multiple edges to _B(y)_.
-That's okay as long as it definitely contains the least-slack edge to _B(y)_.
+_&delta;<sub>4</sub>_ is half of the minimum dual variable of any T-blossom.
+To compute this efficiently, we keep non-trivial T-blossoms in a priority queue.
+The blossoms are inserted into the queue when they become a T-blossom and removed from
+the queue when they stop being a T-blossom.

-When a new S-blossom is created, form its list _L<sub>B</sub>_ by merging the lists
-of its sub-blossoms.
-Ignore edges that are internal to the merged blossom.
-If there are multiple edges to the same target blossom, keep only the least-slack of these edges.
-Then find _e<sub>B</sub>_ of the merged blossom by simply taking the least-slack edge
-out of _L<sub>B</sub>_.
+A complication occurs when dual variables are updated.
+At that point, the dual variable of any T-blossom decreases by _2\*&delta;_.
+But we can not afford to update the priorities of all elements in the queue.
+To solve this, we set the priority of each blossom to its _modified_ dual value
+_z'<sub>B</sub> = z<sub>B</sub> + 2\*&Delta;_.
+These values are invariant under delta steps.

+_&delta;<sub>2</sub>_ is the minimum slack of any edge between an S-vertex and unlabeled vertex.
+To compute this efficiently, we use a fairly complicated strategy that involves
+three levels of priority queues.
+
+At the lowest level, every T-vertex or unlabeled vertex maintains a separate priority queue
+of edges between itself and any S-vertex.
+Edges are inserted into this queue during scanning of newly labeled S-vertices.
+Note that S-vertices do not maintain this type of queue.
+
+The priorities of edges in these queues are set to their _modified slack_.
+This ensures that the priorities remain unchanged during delta steps.
+The priorities also remain unchanged when the T-vertex becomes unlabeled or the unlabeled
+vertex becomes a T-vertex.
+
+At the middle level, every T-blossom or unlabeled top-level blossom maintains a priority queue
+containing its vertices.
+This is in fact the _concatenable priority queue_ that is maintained by every top-level blossom
+as was described earlier in this document.
+The priority of each vertex in the mid-level queue is set to the minimum priority of any edge
+in the low-level queue of that vertex.
+If edges are added to (or removed from) the low-level queue, the priority of the corresponding
+vertex in the mid-level queue may change.
+If the low-level queue of a vertex is empty, that vertex has priority _Infinity_
+in the mid-level queue.
+
+At the highest level, unlabeled top-level blossoms are tracked in one global priority queue.
+The priority of each blossom in this queue is set to the minimum slack of any edge
+from that blossom to an S-vertex plus _&Delta;_.
+These priorities are invariant under delta steps.
+
+To compute _&delta;<sub>2</sub>_, we find the minimum priority in the high-level queue
+and adjust it by _&Delta;_.
+To find the edge associated with _&delta;<sub>2</sub>_,
+we use the high-level queue to find the unlabeled blossom with minimum priority,
+then use that blossom's mid-level queue to find the vertex with minimum priority,
+then use that vertex's low-level queue to find the edge with minimum priority.
+
+The whole thing is a bit tricky, but it works.
+
+### Re-using alternating trees
+
+According to [[5]](#galil1986), labels and alternating trees should be erased at the end of each stage.
+However, the algorithm can be optimized by keeping some of the labels and re-using them
+in the next stage.
+The optimized algorithm erases _only_ the two alternating trees that are part of
+the augmenting path.
+All blossoms in those two trees lose their labels and become free blossoms again.
+Other alternating trees, which are not involved in the augmenting path, are preserved
+into the next stage, and so are the labels on the blossoms in those trees.
+
+This optimization is well known and is described for example in [[7]](#mehlhorn_schafer2002).
+It does not affect the worst-case asymptotic run time of the algorithm,
+but it provides a significant practical speed up for many types of graphs.
+
+Erasing alternating trees is easy enough, but selectively stripping labels off blossoms
+has a few implications.
+The blossoms that lose their labels need to have their modified dual values updated.
+The T-blossoms additionally need to have their _offset<sub>B</sub>_ variables updated
+to keep the vertex dual values consistent.
+For S-blossoms that lose their labels, the modified vertex dual variables are updated directly.
+
+The various priority queues also need updating.
+Former T-blossoms must be removed from the priority queue for _&delta;<sub>4</sub>_.
+Edges incident on former S-vertices must be removed from the priority queues for _&delta;<sub>3</sub>_ and _&delta;<sub>2</sub>_.
+Finally, S-vertices that become unlabeled need to construct a proper priority queue
+of incident edges to other S-vertices for _&delta;<sub>2</sub>_ tracking.
+This involves visiting every incident edge of every vertex in each S-blossom that loses its label.

 ## Run time of the algorithm

 Every stage of the algorithm either increases the number of matched vertices by 2 or
 ends the matching.
 Therefore the number of stages is at most _n/2_.
-Every stage runs in _O(n<sup>2</sup>)_ steps, therefore the complete algorithm runs in
-_O(n<sup>3</sup>)_ steps.
-
-During each stage, edge scanning is driven by the queue _Q_.
-Every vertex enters _Q_ at most once.
-Each vertex that enters _Q_ has its incident edges scanned, causing every edge in the graph
-to be scanned at most twice per stage.
-Scanning an edge is done in constant time, unless it leads to the discovery of a blossom
-or an augmenting path, which will be separately accounted for.
-Therefore edge scanning needs _O(m)_ steps per stage.
+Every stage runs in time _O((n + m) log n)_, therefore the complete algorithm runs in
+time _O(n (n + m) log n)_.

 Creating a blossom reduces the number of top-level blossoms by at least 2,
 thus limiting the number of simultaneously existing blossoms to _O(n)_.
 Blossoms that are created during a stage become S-blossoms and survive until the end of that stage,
 therefore _O(n)_ blossoms are created during a stage.
+Creating a blossom with _k_ sub-blossoms reduces the number of top-level blossoms by _k-1_,
+thus limiting the total number of sub-blossoms that can be involved in blossom creation
+during a stage to _O(n)_.
+
 Creating a blossom involves tracing the alternating path to the closest common ancestor,
-and some bookkeeping per sub-blossom,
-and inserting new S-vertices _Q_,
-all of which can be done in _O(n)_ steps per blossom creation.
-The cost of managing least-slack edges between S-blossoms will be separately accounted for.
-Therefore blossom creation needs _O(n<sup>2</sup>)_ steps per stage
-(excluding least-slack edge management).
+which takes time _O(k log n)_ for a blossom with _k_ sub-blossoms.
+It also involves bookkeeping per sub-blossom, which takes time _O(log n)_ per sub-blossom.
+It also involves relabeling former T-vertices as S-vertices, but I account for that
+time separately below so I can ignore it here.
+It also involves merging the concatenable queues which track the vertices in top-level blossoms.
+Merging two queues takes time _O(log n)_, therefore merging the queues of all sub-blossoms
+takes time _O(k log n)_.
+Creating a blossom thus takes time _O(k log n)_.
+Blossom creation thus takes total time _O(n log n)_ per stage.

-As part of creating a blossom, a list _L<sub>B</sub>_ of least-slack edges must be formed.
-This involves processing every element of all least-slack edge lists of its sub-blossoms,
-and removing redundant edges from the merged list.
-Merging and removing redundant edges can be done in one sweep via a temporary array indexed
-by target blossom.
-Collect the least-slack edges of the sub-blossoms into this array,
-indexed by the target blossom of the edge,
-keeping only the edge with lowest slack per target blossom.
-Then convert the array back into a list by removing unused indices.
-This takes _O(1)_ steps per candidate edge, plus _O(n)_ steps to manage the temporary array.
-I choose to shift the cost of collecting the candidate edges from the sub-blossoms to
-the actions that inserted those edges into the sub-blossom lists.
-There are two processes which insert edges into _L<sub>B</sub>_: edge scanning and blossom
-creation.
-Edge scanning inserts each graph edge at most twice per stage for a total cost of _O(m)_ steps
-per stage.
-Blossom creation inserts at most _O(n)_ edges per blossom creation.
-Therefore the total cost of S-blossom least-slack edge management is
-_O(m + n<sup>2</sup>) = O(n<sup>2</sup>)_ steps per stage.
+During each stage, a blossom becomes an S-blossom or T-blossom at most once.
+A blossom also becomes unlabeled at most once, at the end of the stage.
+Changing the label of a blossom takes some simple bookkeeping, as well as operations
+on priority queues (_&delta;<sub>4</sub>_ for T-blossoms, _&delta;<sub>2</sub>_ for unlabeled
+blossoms) which take time _O(log n)_ per blossom.
+Assigning label S or removing label S also involves work for the vertices in the blossom
+and their edges, but I account for that time separately below so I can ignore it here.
+Blossom labeling thus takes total time _O(n log n)_ per stage.

-The number of blossom expansions during a stage is _O(n)_.
-Expanding a blossom involves some bookkeeping per sub-blossom,
-and reconstructing the alternating path through the blossom,
-and inserting any new S-vertices into _Q_,
-all of which can be done in _O(n)_ steps per blossom.
-Therefore blossom expansion needs _O(n<sup>2</sup>)_ steps per stage.
+During each stage, a vertex becomes an S-vertex at most once, and an S-vertex becomes
+unlabeled at most once.
+In both cases, the incident edges of the affected vertex are scanned and potentially
+added to or removed from priority queues.
+This involves finding the top-level blossoms of the endpoints of each edge, which
+takes time _O(log n)_ per edge.
+The updates to priority queues also take time _O(log n)_ per edge.
+Edge scanning thus takes total time _O(m log n)_ per stage.
+
+Note that _m &le; n<sup>2</sup>_ therefore _log m &le; 2 log n_.
+This implies that an operation on a priority queue with _m_ elements takes time _O(log n)_.
+
+Expanding a blossom involves some bookkeeping which takes time _O(log n)_ per sub-blossom.
+It also involves splitting the concatenable queue that tracks the vertices in top-level blossoms,
+which takes time _O(log n)_ per sub-blossom.
+In case of a T-blossom, it also involves reconstructing the alternating path through
+the blossom which takes time _O(k log n)_ for _k_ sub-blossoms.
+Also in case of a T-blossom, some sub-blossoms will become S-blossoms and their
+vertices become S-vertices, but I have already accounted for that cost above
+so I can ignore it here.
+Expanding a blossom thus takes time _O(k log n)_.
+Any blossom is involved as a sub-blossom in an expanding blossom at most once per stage.
+Blossom expansion thus takes total time _O(n log n)_ per stage.

 The length of an augmenting path is _O(n)_.
-Tracing the augmenting path and augmenting the matching along the path can be done in _O(n)_ steps.
-Augmenting through a blossom takes a number of steps that is proportional in the number of
+Tracing the augmenting path and augmenting the matching along the path can be done
+in _O(n log n)_ steps.
+Augmenting through a blossom takes a number of steps that is proportional to the number of
 its sub-blossoms.
 Since there are fewer than _n_ non-trivial blossoms, the total cost of augmenting through
 blossoms is _O(n)_ steps.
-Therefore the total cost of augmenting is _O(n)_ steps per stage.
+Augmenting thus takes total time _O(n log n)_ per stage.

-A dual variable update limited by _&delta;<sub>1</sub>_ ends the algorithm and therefore
-happens at most once.
-An update limited by _&delta;<sub>2</sub>_ labels a previously labeled blossom
-and therefore happens _O(n)_ times per stage.
-An update limited by _&delta;<sub>3</sub>_ either creates a blossom or finds an augmenting path
-and therefore happens _O(n)_ times per stage.
-An update limited by _&delta;<sub>4</sub>_ expands a blossom and therefore happens
-_O(n)_ times per stage.
-Therefore the number of dual variable updates is _O(n)_ per stage.
-The cost of calculating the _&delta;_ values is _O(n)_ per update as discussed above.
-Applying changes to the dual variables can be done by looping over all vertices and looping over
-all top-level blossoms in _O(n)_ steps per update.
-Therefore the total cost of dual variable updates is _O(n<sup>2</sup>)_ per stage.
+A delta step limited by _&delta;<sub>1</sub>_ ends the algorithm and therefore happens at most once.
+A _&delta;<sub>2</sub>_ step assigns a label to a previously unlabeled blossom and therefore
+happens _O(n)_ times per stage.
+A _&delta;<sub>3</sub>_ step either creates a blossom or finds an augmenting path and therefore
+happens _O(n)_ times per stage.
+A _&delta;<sub>4</sub>_ step expands a blossom and therefore happens _O(n)_ times per stage.
+Therefore the number of delta steps is _O(n)_ per stage.

+Calculating _&delta;<sub>1</sub>_ takes constant time.
+Calculating _&delta;<sub>2</sub>_ and _&delta;<sub>4</sub>_ requires a constant number
+of lookups in priority queues which takes time _O(log n)_ per delta step.
+During calculation of _&delta;<sub>3</sub>_, it may be necessary to remove stage edges
+from the priority queue.
+Since every edge is inserted into the priority queue at most once per stage,
+at most _O(m)_ edges are removed per stage, which takes total time _O(m log n)_ per stage.
+Calculation of _&delta;_ thus takes total time _O((n + m) log n)_ per stage.
+
+Applying updates to dual variables is done in a lazy fashion as discussed above.
+The only variable that is updated directly is _&Delta;_, which takes time _O(1)_ per delta step.
+Updating dual variables thus takes total time _O(n)_ per stage.

 ## Implementation details

@ -819,17 +1006,100 @@ Every edge therefore appears in two adjacency lists.
 These data structures are initialized at the start of the matching algorithm
 and never change while the algorithm runs.

+#### Priority queue
+
+Priority queues are used for a number of purposes:
+
+ - a priority queue to find the least-slack edge between S-blossoms;
+ - a priority queue to find the minimum-dual T-blossom;
+ - a priority queue to find the unlabeled blossom with least-slack edge to an S-blossom;
+ - a separate priority queue per vertex to find the least-slack edge between that vertex
+   and any S-vertex.
+
+These queues are implemented as a binary heaps.
+This type of queue supports the following operations:
+
+ - _insert_ a new element with specified priority in time _O(log n)_;
+ - find the element with _minimum_ priority in time _O(1)_;
+ - _delete_ a specified element in time _O(log n)_;
+ - _change_ the priority of a specified element in time _O(log n)_.
+
+#### Concatenable priority queue
+
+Each top-level blossom maintains a concatenable priority queue containing its vertices.
+We use a specific type of concatenable queue that supports the following operations
+[[4]](#galil_micali_gabow1986) [[8]](#aho_hopcroft_ullman1974):
+
+ - _create_ a new queue containing 1 new element;
+ - find the element with _minimum_ priority in time _O(1)_;
+ - _change_ the priority of a given element in time _O(log n)_;
+ - _merge_ two queues into one new queue in time _O(log n)_;
+ - _split_ a queue, thus undoing the previous _merge_ step in time _O(log n)_.
+
+The efficient _merge_ and _split_ operations make it possible to adapt the queue during
+blossom creation and blossom expansion steps.
+The priorities in the queue are used to find, for a given top-level blossom, its vertex
+with least-slack edge to an S-blossom.
+
+The concatenable queue is implemented as a balanced tree, specifically a _2-3 tree_.
+Each internal node of a 2-3 tree has either 2 or 3 children.
+The leaf nodes of the tree represent the elements of the queue.
+All leaf nodes are at the same distance from the root.
+Each node has a pointer to its parent, and each internal node has pointers to its children.
+Each internal node also stores its height (distance to its leaf nodes).
+
+Only leaf nodes have a priority.
+However, each internal node maintains a pointer to the leaf node with minimum priority
+within its subtree.
+As a consequence, the root of the tree has a pointer to the least-priority element in the queue.
+To keep this information consistent, any change in the priority of a leaf node must
+be followed by updating the internal nodes along a path from the leaf node to the root.
+The same must be done when the structure of the tree is adjusted.
+
+The left-to-right order of the leaf nodes is preserved during all operations, including _merge_
+and _split_.
+When trees _A_ and _B_ are merged, the sequence of leaf nodes in the merged tree will consist of
+the leaf nodes of _A_ followed by the leaf nodes of _B_.
+Note that the left-to-right order of the leaf nodes is unrelated to the priorities of the elements.
+
+To merge two trees, the root of the smaller tree is inserted as a child of an appropriate node
+in the larger tree.
+Subsequent adjustments are needed restore the consistency the 2-3 tree and to update
+the minimum-element pointers of the internal nodes along a path from the insertion point
+to the root of the merged tree.
+This can be done in a number of steps proportional to the difference in height between
+the two trees, which is in any case _O(log n)_.
+
+To split a tree, a _split node_ is identified: the left-most leaf node that must end up
+in the right-side tree after splitting.
+Internal nodes are deleted along the path from the _split node_ to the root of the tree.
+This creates a forest of disconnected subtrees on the left side of the path,
+and a similar forest of subtrees on the right side of the split path.
+The left-side subtrees are reassembled into a single tree through a series of _merge_ steps.
+Although _O(log n)_ merge steps may be needed, the total time required for reassembly
+is also _O(log n)_ because each merge step combines trees of similar height.
+A similar reassembly is done for the forest on the right side of the split path.
+
+The concatenable queues have an additional purpose in the matching algorithm:
+finding the top-level blossom that contains a given vertex.
+To do this, we assign a _name_ to each concatenable queue instance, which is simply
+a pointer to the top-level blossom that maintains the queue.
+An extra operation is defined:
+_find_ the name of the queue instance that contains a given element in time _O(log n)_.
+Implementing the _find_ operation is easy:
+Starting at the leaf node that represents the element, follow _parent_ pointers
+to the root of the tree.
+The root node contains the name of the queue.
+
 #### General data

 `vertex_mate[x] = y` if the edge between vertex _x_ and vertex _y_ is matched. <br>
 `vertex_mate[x] = -1` if vertex _x_ is unmatched.

-`vertex_top_blossom[x] =` pointer to _B(x)_, the top-level blossom that contains vertex _x_.
+`vertex_dual[x]` holds the modified vertex dual _u'<sub>x</sub>_.

-`vertex_dual[x]` holds the value of _u<sub>x</sub>_.
-
-A FIFO queue holds vertex indices of S-vertices whose edges have not yet been scanned.
-Vertices are inserted in this queue as soon as their top-level blossom gets label S.
+A global list holds vertex indices of S-vertices whose edges have not yet been scanned.
+Vertices are inserted in this list when their top-level blossom gets label S.

 #### Blossoms

@ -842,56 +1112,28 @@ Both types of blossoms are represented as class instances with the following att
 * `B.label` is `S` or `T` or `None`
 * `B.tree_edge = (x, y)` if _B_ is a labeled top-level blossom, where _y_ is a vertex in _B_
   and _(x, y)_ is the edge that links _B_ to its parent in the alternating tree.
+ * `B.tree_blossoms` points to a list of blossoms in the same alternating tree, if _B_
+   is a labeled top-level blossom.
+ * `B.vertex_dual_offset` holds the pending change to vertex duals _offset<sub>B</sub>_.

 A non-trivial blossom additionally has the following attributes:

 * `B.subblossoms` is an array of pointers to the sub-blossoms of _B_,
   starting with the sub-blossom that contains the base vertex.
 * `B.edges` is an array of alternating edges connecting the sub-blossoms.
- * `B.dual_var` holds the value of _z<sub>B</sub>_.
+ * `B.dual_var` holds the modified blossom dual _z'<sub>B</sub>_.

 Single-vertex blossoms are kept in an array indexed by vertex index. <br>
-Non-trivial blossoms are kept in a separate array. <br>
+Non-trivial blossoms are kept in a separate list. <br>
 These arrays are used to iterate over blossoms and to find the trivial blossom
 that consists of a given vertex.

-#### Least-slack edge tracking
-
-`vertex_best_edge[x]` is an array holding _e<sub>x</sub>_, the edge index of
-the least-slack edge between vertex _x_ and any S-vertex, or -1 if there is no such edge.
-This value is only meaningful if _x_ is a T-vertex or unlabeled vertex.
-
-`B.best_edge` is a blossom attribute holding _e<sub>B</sub>_, the edge index of the least-slack
-edge between blossom _B_ and any other S-blossom, or -1 if there is no such edge.
-This value is only meaningful if _B_ is a top-level S-blossom.
-
-For non-trivial S-blossoms _B_, attribute `B.best_edge_set` holds the list _L<sub>B</sub>_
-of potential least-slack edges to other blossoms.
-This list is not maintained for single-vertex blossoms, since _L<sub>B</sub>_ of a single vertex
-can be efficiently constructed from its adjacency list.
-
 #### Memory usage

 The data structures described above use a constant amount of memory per vertex and per edge
 and per blossom.
 Therefore the total memory requirement is _O(m + n)_.

-The memory usage of _L<sub>B</sub>_ is a little tricky.
-Any given list _L<sub>B</sub>_ can have length _O(n)_, and _O(n)_ of these lists can exist
-simultaneously.
-Naively allocating space for _O(n)_ elements per list will drive memory usage
-up to _O(n<sup>2</sup>)_.
-However, at any moment, an edge can be in at most two of these lists, therefore the sum
-of the lengths of these lists is limited to _O(m)_.
-A possible solution is to implement the _L<sub>B</sub>_ as linked lists.
-
-### Performance critical routines
-
-Calculations that happen very frequently in the algorithm are:
-determining the top-level blossom of a given vertex, and calculating the slack of a given edge.
-These calculations must run in constant time per call in any case, but it makes sense to put
-some extra effort into making these calculations _fast_.
-
 ### Recursion

 Certain tasks in the algorithm are recursive in nature:
@ -948,59 +1190,41 @@ Proof by induction that all vertex duals are multiples of 0.5 and all blossom du
 - Blossom duals increase or decrease by _2\*&delta;_,
   therefore updated blossom duals are still integers.

-The value of vertex dual variables and blossom dual variables never exceeds the
-greatest edge weight in the graph.
-This may be helpful for choosing an integer data type for the dual variables.
+It is useful to know that (modified) dual variables and (modified) edge slacks
+are limited to a finite range of values which depends on the maximum edge weight.
+This may be helpful when choosing an integer data type for these variables.
 (Alternatively, choose a programming language with unlimited integer range.
 This is perhaps the thing I love most about Python.)

-Proof that dual variables do not exceed _max-weight_:
-
- - Vertex dual variables start at _u<sub>x</sub> = 0.5\*max-weight_.
- - While the algorithm runs, there is at least one vertex which has been unmatched
-   since the beginning.
-   This vertex has always had label S, therefore its dual always decreased by _&delta;_
-   during a dual variable update.
-   Since it started at _0.5\*max-weight_ and can not become negative,
-   the sum of _&delta;_ over all dual variable updates can not exceed _0.5\*max-weight_.
- - Vertex dual variables increase by at most _&delta;_ per update.
-   Therefore no vertex dual can increase by more than _0.5\*max-weight_ in total.
-   Therefore no vertex dual can exceed _max-weight_.
- - Blossom dual variables start at _z<sub>B</sub> = 0_.
- - Blossom dual variables increase by at most _2\*&delta;_ per update.
-   Therefore no blossom dual can increase by more than _max-weight_ in total.
-   Therefore no blossom dual can exceed _max-weight_.
+ - The value of _&Delta;_ (sum over _&delta;_ steps) does not exceed _maxweight / 2_.
+   Proof:
+    - Vertex dual variables start at _u<sub>x</sub> = maxweight_ / 2.
+    - While the algorithm runs, there is at least one vertex which has been unmatched
+      since the beginning.
+      This vertex has always had label S, therefore its dual is _maxweight/2 - &Delta;_.
+      Vertex deltas can not be negative, therefore _&Delta; &le; maxweight/2_.
+ - Vertex duals are limited to the range 0 to _maxweight_.
+ - Blossom duals are limited to the range 0 to _maxweight_.
+ - Edge slack is limited to the range 0 to _2\*maxweight_.
+ - Modified vertex duals are limited to the range 0 to _1.5\*maxweight_.
+ - Modified blossom duals are limited to the range _-maxweight to 2\*maxweight_.
+ - Modified edge slack is limited to the range 0 to _3\*maxweight_.
+ - Dual offsets are limited to the range _-maxweight/2_ to _maxweight/2_.

 ### Handling floating point edge weights

 Floating point calculations are subject to rounding errors.
-This has two consequences for the matching algorithm:
+As a result, the algorithm may return a matching which has slightly lower weight than
+the actual maximum weight.

-  - The algorithm may return a matching which has slightly lower weight than
-    the actual maximum weight.
+The algorithm will allways return a valid matching, even if rounding errors occur.
+Floating point comparisons affect which actions are taken during delta steps,
+and thus eventually determine which edges are matched.
+But the overall structure of the algorithm guarantees that it will eventually return
+a valid (if possibly suboptimal) matching.

-  - The algorithm may not reliably recognize tight edges.
-    To check whether an edge is tight, its slack is compared to zero.
-    Rounding errors may cause the slack to appear positive even when an exact calculation
-    would show it to be zero.
-    The slack of some edges may even become slightly negative.
-
-    I believe this does not affect the correctness of the algorithm.
-    An edge that should be tight but is not recognized as tight due to rounding errors,
-    can be pulled tight through an additional dual variable update.
-    As side-effect of this update, the edge will immediately be used to grow the alternating tree,
-    or construct a blossom or augmenting path.
-    This mechanism allows the algorithm to make progress, even if slack comparisons
-    are repeatedly thrown off by rounding errors.
-    Rounding errors may cause the algorithm to perform more dual variable updates
-    than strictly necessary.
-    But this will still not cause the run time of the algorithm to exceed _O(n<sup>3</sup>)_.
-
-It seems to me that the matching algorithm is stable for floating point weights.
-And it seems to me that it returns a matching which is close to optimal,
-and could have been optimal if edge weights were changed by small amounts.
-
-I must admit these arguments are mostly based on intuition.
+The most challenging cases are probably graphs with edge weights that differ by many orders
+of magnitude.
 Unfortunately I don't know how to properly analyze the floating point accuracy of this algorithm.

 ### Finding a maximum weight matching out of all maximum cardinality matchings
@ -1057,7 +1281,14 @@ changing all weights by the same amount doesn't change which of these matchings
    ([link](https://dl.acm.org/doi/abs/10.5555/320176.320229))
    ([pdf](https://dl.acm.org/doi/pdf/10.5555/320176.320229))

+ 7. <a id="mehlhorn_schafer2002"></a>
+    Kurt Mehlhorn, Guido Schäfer, "Implementation of O(nm log(n)) Weighted Matchings in General Graphs: The Power of Data Structures", _Journal of Experimental Algorithmics vol. 7_, 2002.
+    ([link](https://dl.acm.org/doi/10.1145/944618.944622))
+
+ 8. <a id="aho_hopcroft_ullman1974"></a>
+    Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman,
+    _The Design and Analysis of Computer Algorithms_, Addison-Wesley, 1974.

 ---
-Written in 2023 by Joris van Rantwijk.
+Written in 2023-2024 by Joris van Rantwijk.
 This work is licensed under [CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0/).
--- a/python/mwmatching/init.py
+++ b/python/mwmatching/init.py
@ -0,0 +1,11 @@
+"""
+Algorithm for finding a maximum weight matching in general graphs.
+"""
+
+__all__ = ["maximum_weight_matching",
+           "adjust_weights_for_maximum_cardinality_matching",
+           "MatchingError"]
+
+from .algorithm import (maximum_weight_matching,
+                        adjust_weights_for_maximum_cardinality_matching,
+                        MatchingError)
--- a/python/mwmatching/algorithm.py
+++ b/python/mwmatching/algorithm.py
--- a/python/mwmatching/datastruct.py
+++ b/python/mwmatching/datastruct.py
@ -0,0 +1,634 @@
+"""Data structures for matching."""
+
+from __future__ import annotations
+
+from typing import Generic, Optional, TypeVar
+
+
+_NameT = TypeVar("_NameT")
+_NameT2 = TypeVar("_NameT2")
+_ElemT = TypeVar("_ElemT")
+_ElemT2 = TypeVar("_ElemT2")
+
+
+class ConcatenableQueue(Generic[_NameT, _ElemT]):
+    """Priority queue supporting efficient merge and split operations.
+
+    This is a combination of a disjoint set and a priority queue.
+    A queue has a "name", which can be any Python object.
+    Each element has associated "data", which can be any Python object.
+    Each element has a priority.
+
+    The following operations can be done efficiently:
+     - Create a new queue containing one new element.
+     - Find the name of the queue that contains a given element.
+     - Change the priority of a given element.
+     - Find the element with lowest priority in a given queue.
+     - Merge two or more queues.
+     - Undo a previous merge step.
+
+    This data structure is implemented as a 2-3 tree with minimum-priority
+    tracking added to it.
+    """
+
+    __slots__ = ("name", "tree", "first_node", "sub_queues")
+
+    class BaseNode(Generic[_NameT2, _ElemT2]):
+        """Node in the 2-3 tree."""
+
+        __slots__ = ("owner", "min_node", "height", "parent", "childs")
+
+        def __init__(self,
+                     min_node: ConcatenableQueue.Node[_NameT2, _ElemT2],
+                     height: int
+                     ) -> None:
+            """Initialize a new node."""
+            self.owner: Optional[ConcatenableQueue[_NameT2, _ElemT2]] = None
+            self.min_node = min_node
+            self.height = height
+            self.parent: Optional[ConcatenableQueue.BaseNode[_NameT2,
+                                                             _ElemT2]]
+            self.parent = None
+            self.childs: list[ConcatenableQueue.BaseNode[_NameT2, _ElemT2]]
+            self.childs = []
+
+    class Node(BaseNode[_NameT2, _ElemT2]):
+        """Leaf node in the 2-3 tree, representing an element in the queue."""
+
+        __slots__ = ("data", "prio")
+
+        def __init__(self, data: _ElemT2, prio: float) -> None:
+            """Initialize a new leaf node.
+
+            This method should not be called directly.
+            Instead, call ConcatenableQueue.insert().
+            """
+            super().__init__(min_node=self, height=0)
+            self.data = data
+            self.prio = prio
+
+        def find(self) -> _NameT2:
+            """Return the name of the queue that contains this element.
+
+            This function takes time O(log(n)).
+            """
+            node: ConcatenableQueue.BaseNode[_NameT2, _ElemT2] = self
+            while node.parent is not None:
+                node = node.parent
+            assert node.owner is not None
+            return node.owner.name
+
+        def set_prio(self, prio: float) -> None:
+            """Change the priority of this element.
+
+            This function takes time O(log(n)).
+            """
+            self.prio = prio
+            node = self.parent
+            while node is not None:
+                min_node = node.childs[0].min_node
+                for child in node.childs[1:]:
+                    if child.min_node.prio < min_node.prio:
+                        min_node = child.min_node
+                node.min_node = min_node
+                node = node.parent
+
+    def __init__(self, name: _NameT) -> None:
+        """Initialize an empty queue.
+
+        This function takes time O(1).
+
+        Parameters:
+            name: Name to assign to the new queue.
+        """
+        self.name = name
+        self.tree: Optional[ConcatenableQueue.BaseNode[_NameT, _ElemT]] = None
+        self.first_node: Optional[ConcatenableQueue.Node[_NameT, _ElemT]]
+        self.first_node = None
+        self.sub_queues: list[ConcatenableQueue[_NameT, _ElemT]] = []
+
+    def clear(self) -> None:
+        """Remove all elements from the queue.
+
+        This function takes time O(n).
+        """
+        node = self.tree
+        self.tree = None
+        self.first_node = None
+        self.sub_queues = []
+
+        # Wipe pointers to enable refcounted garbage collection.
+        if node is not None:
+            node.owner = None
+        while node is not None:
+            node.min_node = None  # type: ignore
+            prev_node = node
+            if node.childs:
+                node = node.childs.pop()
+            else:
+                node = node.parent
+                prev_node.parent = None
+
+    def insert(self, elem: _ElemT, prio: float) -> Node[_NameT, _ElemT]:
+        """Insert an element into the empty queue.
+
+        This function can only be used if the queue is empty.
+        Non-empty queues can grow only by merging.
+
+        This function takes time O(1).
+
+        Parameters:
+            elem: Element to insert.
+            prio: Initial priority of the new element.
+        """
+        assert self.tree is None
+        self.tree = ConcatenableQueue.Node(elem, prio)
+        self.tree.owner = self
+        self.first_node = self.tree
+        return self.tree
+
+    def min_prio(self) -> float:
+        """Return the minimum priority of any element in the queue.
+
+        The queue must be non-empty.
+        This function takes time O(1).
+        """
+        node = self.tree
+        assert node is not None
+        return node.min_node.prio
+
+    def min_elem(self) -> _ElemT:
+        """Return the element with minimum priority.
+
+        The queue must be non-empty.
+        This function takes time O(1).
+        """
+        node = self.tree
+        assert node is not None
+        return node.min_node.data
+
+    def merge(self,
+              sub_queues: list[ConcatenableQueue[_NameT, _ElemT]]
+              ) -> None:
+        """Merge the specified queues.
+
+        This queue must inititially be empty.
+        All specified sub-queues must initially be non-empty.
+
+        This function removes all elements from the specified sub-queues
+        and adds them to this queue.
+
+        After merging, this queue retains a reference to the list of
+        sub-queues.
+
+        This function takes time O(len(sub_queues) * log(n)).
+        """
+        assert self.tree is None
+        assert not self.sub_queues
+        assert sub_queues
+
+        # Keep the list of sub-queues.
+        self.sub_queues = sub_queues
+
+        # Move the root node from the first sub-queue to this queue.
+        # Clear its owner pointer.
+        self.tree = sub_queues[0].tree
+        self.first_node = sub_queues[0].first_node
+        assert self.tree is not None
+        sub_queues[0].tree = None
+        self.tree.owner = None
+
+        # Merge remaining sub-queues.
+        for sub in sub_queues[1:]:
+
+            # Pull the root node from the sub-queue.
+            # Clear its owner pointer.
+            subtree = sub.tree
+            assert subtree is not None
+            assert subtree.owner is sub
+            subtree.owner = None
+
+            # Merge our current tree with the tree from the sub-queue.
+            self.tree = self._join(self.tree, subtree)
+
+        # Put the owner pointer in the root node.
+        self.tree.owner = self
+
+    def split(self) -> None:
+        """Undo the merge step that filled this queue.
+
+        Remove all elements from this queue and put them back in
+        the sub-queues from which they came.
+
+        After splitting, this queue will be empty.
+
+        This function takes time O(k * log(n)).
+        """
+        assert self.tree is not None
+        assert self.sub_queues
+
+        # Clear the owner pointer from the root node.
+        assert self.tree.owner is self
+        self.tree.owner = None
+
+        # Split the tree to reconstruct each sub-queue.
+        for sub in self.sub_queues[:0:-1]:
+
+            assert sub.first_node is not None
+            (tree, rtree) = self._split_tree(sub.first_node)
+
+            # Assign the right tree to the sub-queue.
+            sub.tree = rtree
+            rtree.owner = sub
+
+        # Put the remaining tree in the first sub-queue.
+        self.sub_queues[0].tree = tree
+        tree.owner = self.sub_queues[0]
+
+        # Make this queue empty.
+        self.tree = None
+        self.first_node = None
+        self.sub_queues = []
+
+    @staticmethod
+    def _repair_node(node: BaseNode[_NameT, _ElemT]) -> None:
+        """Repair min_prio attribute of an internal node."""
+        min_node = node.childs[0].min_node
+        for child in node.childs[1:]:
+            if child.min_node.prio < min_node.prio:
+                min_node = child.min_node
+        node.min_node = min_node
+
+    @staticmethod
+    def _new_internal_node(ltree: BaseNode[_NameT, _ElemT],
+                           rtree: BaseNode[_NameT, _ElemT]
+                           ) -> BaseNode[_NameT, _ElemT]:
+        """Create a new internal node with 2 child nodes."""
+        assert ltree.height == rtree.height
+        height = ltree.height + 1
+        if ltree.min_node.prio <= rtree.min_node.prio:
+            min_node = ltree.min_node
+        else:
+            min_node = rtree.min_node
+        node = ConcatenableQueue.BaseNode(min_node, height)
+        node.childs = [ltree, rtree]
+        ltree.parent = node
+        rtree.parent = node
+        return node
+
+    def _join_right(self,
+                    ltree: BaseNode[_NameT, _ElemT],
+                    rtree: BaseNode[_NameT, _ElemT]
+                    ) -> BaseNode[_NameT, _ElemT]:
+        """Join two trees together.
+
+        The initial left subtree must be higher than the right subtree.
+
+        Return the root node of the joined tree.
+        """
+
+        # Descend down the right spine of the left tree until we
+        # reach a node just above the right tree.
+        node = ltree
+        while node.height > rtree.height + 1:
+            node = node.childs[-1]
+
+        assert node.height == rtree.height + 1
+
+        # Find a node in the left tree to insert the right tree as child.
+        while len(node.childs) == 3:
+            # This node already has 3 childs so we can not add the right tree.
+            # Rearrange into 2 nodes with 2 childs each, then solve it
+            # at the parent level.
+            #
+            #       N                     N       R'
+            #     / | \                  / \     / \
+            #    /  |  \        --->    /   \   /   \
+            #   A   B   C   R           A   B   C   R
+            #
+            child = node.childs.pop()
+            self._repair_node(node)
+            rtree = self._new_internal_node(child, rtree)
+            if node.parent is None:
+                # Create a new root node.
+                return self._new_internal_node(node, rtree)
+            node = node.parent
+
+        # Insert the right tree as child of this node.
+        assert len(node.childs) < 3
+        node.childs.append(rtree)
+        rtree.parent = node
+
+        # Repair min-prio pointers of ancestors.
+        while True:
+            self._repair_node(node)
+            if node.parent is None:
+                break
+            node = node.parent
+
+        return node
+
+    def _join_left(self,
+                   ltree: BaseNode[_NameT, _ElemT],
+                   rtree: BaseNode[_NameT, _ElemT]
+                   ) -> BaseNode[_NameT, _ElemT]:
+        """Join two trees together.
+
+        The initial left subtree must be lower than the right subtree.
+
+        Return the root node of the joined tree.
+        """
+
+        # Descend down the left spine of the right tree until we
+        # reach a node just above the left tree.
+        node = rtree
+        while node.height > ltree.height + 1:
+            node = node.childs[0]
+
+        assert node.height == ltree.height + 1
+
+        # Find a node in the right tree to insert the left tree as child.
+        while len(node.childs) == 3:
+            # This node already has 3 childs so we can not add the left tree.
+            # Rearrange into 2 nodes with 2 childs each, then solve it
+            # at the parent level.
+            #
+            #          N                L'      N
+            #        / | \             / \     / \
+            #       /  |  \    --->   /   \   /   \
+            #  L   A   B   C          L   A   B   C
+            #
+            child = node.childs.pop(0)
+            self._repair_node(node)
+            ltree = self._new_internal_node(ltree, child)
+            if node.parent is None:
+                # Create a new root node.
+                return self._new_internal_node(ltree, node)
+            node = node.parent
+
+        # Insert the left tree as child of this node.
+        assert len(node.childs) < 3
+        node.childs.insert(0, ltree)
+        ltree.parent = node
+
+        # Repair min-prio pointers of ancestors.
+        while True:
+            self._repair_node(node)
+            if node.parent is None:
+                break
+            node = node.parent
+
+        return node
+
+    def _join(self,
+              ltree: BaseNode[_NameT, _ElemT],
+              rtree: BaseNode[_NameT, _ElemT]
+              ) -> BaseNode[_NameT, _ElemT]:
+        """Join two trees together.
+
+        The left and right subtree must be consistent 2-3 trees.
+        Initial parent pointers of these subtrees are ignored.
+
+        Return the root node of the joined tree.
+        """
+        if ltree.height > rtree.height:
+            return self._join_right(ltree, rtree)
+        elif ltree.height < rtree.height:
+            return self._join_left(ltree, rtree)
+        else:
+            return self._new_internal_node(ltree, rtree)
+
+    def _split_tree(self,
+                    split_node: BaseNode[_NameT, _ElemT]
+                    ) -> tuple[BaseNode[_NameT, _ElemT],
+                               BaseNode[_NameT, _ElemT]]:
+        """Split a tree on a specified node.
+
+        Two new trees will be constructed.
+        Leaf nodes to the left of "split_node" will go to the left tree.
+        Leaf nodes to the right of "split_node", and "split_node" itself,
+        will go to the right tree.
+
+        Return tuple (ltree, rtree).
+        """
+
+        # Detach "split_node" from its parent.
+        # Assign it to the right tree.
+        parent = split_node.parent
+        split_node.parent = None
+
+        # The left tree is initially empty.
+        # The right tree initially contains only "split_node".
+        ltree: Optional[ConcatenableQueue.BaseNode[_NameT, _ElemT]] = None
+        rtree = split_node
+
+        # Walk up to the root of the tree.
+        # On the way up, detach each node from its parent and join its
+        # child nodes to the appropriate tree.
+        node = split_node
+        while parent is not None:
+
+            # Ascend to the parent node.
+            child = node
+            node = parent
+            parent = node.parent
+
+            # Detach "node" from its parent.
+            node.parent = None
+
+            if len(node.childs) == 3:
+                if node.childs[0] is child:
+                    # "node" has 3 child nodes.
+                    # Its left subtree has already been split.
+                    # Turn it into a 2-node and join it to the right tree.
+                    node.childs.pop(0)
+                    self._repair_node(node)
+                    rtree = self._join(rtree, node)
+                elif node.childs[2] is child:
+                    # "node" has 3 child nodes.
+                    # Its right subtree has already been split.
+                    # Turn it into a 2-node and join it to the left tree.
+                    node.childs.pop()
+                    self._repair_node(node)
+                    if ltree is None:
+                        ltree = node
+                    else:
+                        ltree = self._join(node, ltree)
+                else:
+                    # "node has 3 child nodes.
+                    # Its middle subtree has already been split.
+                    # Join its left child to the left tree, and its right
+                    # child to the right tree, then delete "node".
+                    node.childs[0].parent = None
+                    node.childs[2].parent = None
+                    if ltree is None:
+                        ltree = node.childs[0]
+                    else:
+                        ltree = self._join(node.childs[0], ltree)
+                    rtree = self._join(rtree, node.childs[2])
+
+            elif node.childs[0] is child:
+                # "node" has 2 child nodes.
+                # Its left subtree has already been split.
+                # Join its right child to the right tree, then delete "node".
+                node.childs[1].parent = None
+                rtree = self._join(rtree, node.childs[1])
+
+            else:
+                # "node" has 2 child nodes.
+                # Its right subtree has already been split.
+                # Join its left child to the left tree, then delete "node".
+                node.childs[0].parent = None
+                if ltree is None:
+                    ltree = node.childs[0]
+                else:
+                    ltree = self._join(node.childs[0], ltree)
+
+        assert ltree is not None
+        return (ltree, rtree)
+
+
+class PriorityQueue(Generic[_ElemT]):
+    """Priority queue based on a binary heap."""
+
+    __slots__ = ("heap", )
+
+    class Node(Generic[_ElemT2]):
+        """Node in the priority queue."""
+
+        __slots__ = ("index", "prio", "data")
+
+        def __init__(
+                self,
+                index: int,
+                prio: float,
+                data: _ElemT2
+                ) -> None:
+            self.index = index
+            self.prio = prio
+            self.data = data
+
+    def __init__(self) -> None:
+        """Initialize an empty queue."""
+        self.heap: list[PriorityQueue.Node[_ElemT]] = []
+
+    def clear(self) -> None:
+        """Remove all elements from the queue.
+
+        This function takes time O(n).
+        """
+        self.heap.clear()
+
+    def empty(self) -> bool:
+        """Return True if the queue is empty."""
+        return (not self.heap)
+
+    def find_min(self) -> Node[_ElemT]:
+        """Return the minimum-priority node.
+
+        This function takes time O(1).
+        """
+        if not self.heap:
+            raise IndexError("Queue is empty")
+        return self.heap[0]
+
+    def _sift_up(self, index: int) -> None:
+        """Repair the heap along an ascending path to the root."""
+        node = self.heap[index]
+        prio = node.prio
+
+        pos = index
+        while pos > 0:
+            tpos = (pos - 1) // 2
+            tnode = self.heap[tpos]
+            if tnode.prio <= prio:
+                break
+            tnode.index = pos
+            self.heap[pos] = tnode
+            pos = tpos
+
+        if pos != index:
+            node.index = pos
+            self.heap[pos] = node
+
+    def _sift_down(self, index: int) -> None:
+        """Repair the heap along a descending path."""
+        num_elem = len(self.heap)
+        node = self.heap[index]
+        prio = node.prio
+
+        pos = index
+        while True:
+            tpos = 2 * pos + 1
+            if tpos >= num_elem:
+                break
+            tnode = self.heap[tpos]
+
+            qpos = tpos + 1
+            if qpos < num_elem:
+                qnode = self.heap[qpos]
+                if qnode.prio <= tnode.prio:
+                    tpos = qpos
+                    tnode = qnode
+
+            if tnode.prio >= prio:
+                break
+
+            tnode.index = pos
+            self.heap[pos] = tnode
+            pos = tpos
+
+        if pos != index:
+            node.index = pos
+            self.heap[pos] = node
+
+    def insert(self, prio: float, data: _ElemT) -> Node:
+        """Insert a new element into the queue.
+
+        This function takes time O(log(n)).
+
+        Returns:
+            Node that represents the new element.
+        """
+        new_index = len(self.heap)
+        node = self.Node(new_index, prio, data)
+        self.heap.append(node)
+        self._sift_up(new_index)
+        return node
+
+    def delete(self, elem: Node[_ElemT]) -> None:
+        """Delete the specified element from the queue.
+
+        This function takes time O(log(n)).
+        """
+        index = elem.index
+        assert self.heap[index] is elem
+
+        node = self.heap.pop()
+        if index < len(self.heap):
+            node.index = index
+            self.heap[index] = node
+            if node.prio < elem.prio:
+                self._sift_up(index)
+            elif node.prio > elem.prio:
+                self._sift_down(index)
+
+    def decrease_prio(self, elem: Node[_ElemT], prio: float) -> None:
+        """Decrease the priority of an existing element in the queue.
+
+        This function takes time O(log(n)).
+        """
+        assert self.heap[elem.index] is elem
+        assert prio <= elem.prio
+        elem.prio = prio
+        self._sift_up(elem.index)
+
+    def increase_prio(self, elem: Node[_ElemT], prio: float) -> None:
+        """Increase the priority of an existing element in the queue.
+
+        This function takes time O(log(n)).
+        """
+        assert self.heap[elem.index] is elem
+        assert prio >= elem.prio
+        elem.prio = prio
+        self._sift_down(elem.index)
--- a/python/mwmatching/py.typed
+++ b/python/mwmatching/py.typed
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@ -0,0 +1,15 @@
+
+[build-system]
+requires = ["setuptools >= 61.0"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "mwmatching"
+version = "3.0"
+authors = [{name = "Joris van Rantwijk"}]
+requires-python = ">= 3.7"
+classifiers = [
+    "License :: OSI Approved :: MIT License",
+    "Private :: Do Not Upload"
+]
+
--- a/python/run_matching.py
+++ b/python/run_matching.py
@ -18,7 +18,7 @@ from mwmatching import (maximum_weight_matching,
                        adjust_weights_for_maximum_cardinality_matching)


-def parse_int_or_float(s: str) -> int|float:
+def parse_int_or_float(s: str) -> float:
    """Convert a string to integer or float value."""
    try:
        return int(s)
@ -27,7 +27,7 @@ def parse_int_or_float(s: str) -> int|float:
    return float(s)


-def read_dimacs_graph(f: TextIO) -> list[tuple[int, int, int|float]]:
+def read_dimacs_graph(f: TextIO) -> list[tuple[int, int, float]]:
    """Read a graph in DIMACS edge list format."""

    edges: list[tuple[int, int, float]] = []
@ -70,7 +70,7 @@ def read_dimacs_graph(f: TextIO) -> list[tuple[int, int, int|float]]:
    return edges


-def read_dimacs_graph_file(filename: str) -> list[tuple[int, int, int|float]]:
+def read_dimacs_graph_file(filename: str) -> list[tuple[int, int, float]]:
    """Read a graph from file or stdin."""
    if filename:
        with open(filename, "r", encoding="ascii") as f:
@ -87,11 +87,11 @@ def read_dimacs_graph_file(filename: str) -> list[tuple[int, int, int|float]]:

 def read_dimacs_matching(
        f: TextIO
-        ) -> tuple[int|float, list[tuple[int, int]]]:
+        ) -> tuple[float, list[tuple[int, int]]]:
    """Read a matching solution in DIMACS format."""

    have_weight = False
-    weight: int|float = 0
+    weight: float = 0
    pairs: list[tuple[int, int]] = []

    for line in f:
@ -138,7 +138,7 @@ def read_dimacs_matching(

 def read_dimacs_matching_file(
        filename: str
-        ) -> tuple[int|float, list[tuple[int, int]]]:
+        ) -> tuple[float, list[tuple[int, int]]]:
    """Read a matching from file."""
    with open(filename, "r", encoding="ascii") as f:
        try:
@ -149,7 +149,7 @@ def read_dimacs_matching_file(

 def write_dimacs_matching(
        f: TextIO,
-        weight: int|float,
+        weight: float,
        pairs: list[tuple[int, int]]
        ) -> None:
    """Write a matching solution in DIMACS format."""
@ -165,7 +165,7 @@ def write_dimacs_matching(

 def write_dimacs_matching_file(
        filename: str,
-        weight: int|float,
+        weight: float,
        pairs: list[tuple[int, int]]
        ) -> None:
    """Write a matching to file or stdout."""
@ -177,15 +177,15 @@ def write_dimacs_matching_file(


 def calc_matching_weight(
-        edges: list[tuple[int, int, int|float]],
+        edges: list[tuple[int, int, float]],
        pairs: list[tuple[int, int]]
-        ) -> int|float:
+        ) -> float:
    """Verify that the matching is valid and calculate its weight.

    Matched pairs are assumed to be in the same order as edges.
    """

-    weight: int|float = 0
+    weight: float = 0

    edge_pos = 0
    for pair in pairs:
@ -267,7 +267,7 @@ def verify_matching(filename: str, maxcard: bool, wfactor: float) -> bool:
    edges = read_dimacs_graph_file(filename)
    (gold_weight, gold_pairs) = read_dimacs_matching_file(matching_filename)

-    edges_adj: Sequence[tuple[int, int, int|float]] = edges
+    edges_adj: Sequence[tuple[int, int, float]] = edges

    if wfactor != 1.0:
        if wfactor.is_integer():
--- a/python/tests/init.py
+++ b/python/tests/init.py
--- a/python/tests/test_algorithm.py
+++ b/python/tests/test_algorithm.py
@ -4,10 +4,12 @@ import math
 import unittest
 from unittest.mock import Mock

-import mwmatching
 from mwmatching import (
    maximum_weight_matching as mwm,
    adjust_weights_for_maximum_cardinality_matching as adj)
+from mwmatching.algorithm import (
+    MatchingError, GraphInfo, Blossom, NonTrivialBlossom, MatchingContext,
+    verify_optimum)


 class TestMaximumWeightMatching(unittest.TestCase):
@ -243,6 +245,35 @@ class TestMaximumWeightMatching(unittest.TestCase):
            mwm([(1,2,19), (1,4,17), (1,5,19), (2,3,15), (2,5,21), (4,6,18), (4,7,11), (5,6,19)]),
            [(1,5), (2,3), (4,6)])

+    def test52_augment_blossom_nested2(self):
+        """augment nested blossoms"""
+        #
+        #   [4]--15     19--[2]
+        #    |     \   /     |
+        #   16      [1]     21
+        #    |     /   \     |
+        #   [5]--17     19--[3]
+        #    |
+        #   10
+        #    |
+        #   [6]
+        #
+        self.assertEqual(
+            mwm([(1,2,19), (1,3,19), (1,4,15), (1,5,17), (2,3,21), (4,5,16), (5,6,10)]),
+            [(1,4), (2,3), (5,6)])
+
+    def test61_triangles_n9(self):
+        """t.f 9 nodes"""
+        #
+        #  [1]------[4]      [7]
+        #   | \      | \      | \
+        #   |  [3]   |  [6]   |  [9]
+        #   | /      | /      | /
+        #  [2]      [5]------[8]
+        #
+        result = mwm([(1,2,1), (1,3,1), (2,3,1), (4,5,1), (4,6,1), (5,6,1), (7,8,1), (7,9,1), (8,9,1), (1,4,1), (5,8,1)])
+        self.assertEqual(len(result), 4)
+
    def test_fail_bad_input(self):
        """bad input values"""
        with self.assertRaises(TypeError):
@ -402,10 +433,10 @@ class TestMaximumCardinalityMatching(unittest.TestCase):


 class TestGraphInfo(unittest.TestCase):
-    """Test _GraphInfo helper class."""
+    """Test GraphInfo helper class."""

    def test_empty(self):
-        graph = mwmatching._GraphInfo([])
+        graph = GraphInfo([])
        self.assertEqual(graph.num_vertex, 0)
        self.assertEqual(graph.edges, [])
        self.assertEqual(graph.adjacent_edges, [])
@ -420,8 +451,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate,
            vertex_dual_2x,
            nontrivial_blossom):
-        ctx = Mock(spec=mwmatching._MatchingContext)
-        ctx.graph = mwmatching._GraphInfo(edges)
+        ctx = Mock(spec=MatchingContext)
+        ctx.graph = GraphInfo(edges)
        ctx.vertex_mate = vertex_mate
        ctx.vertex_dual_2x = vertex_dual_2x
        ctx.nontrivial_blossom = nontrivial_blossom
@ -434,7 +465,7 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 1],
            vertex_dual_2x=[0, 20, 2],
            nontrivial_blossom=[])
-        mwmatching._verify_optimum(ctx)
+        verify_optimum(ctx)

    def test_asymmetric_matching(self):
        edges = [(0,1,10), (1,2,11)]
@ -443,8 +474,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 0],
            vertex_dual_2x=[0, 20, 2],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_nonexistent_matched_edge(self):
        edges = [(0,1,10), (1,2,11)]
@ -453,8 +484,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[2, -1, 0],
            vertex_dual_2x=[11, 11, 11],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_negative_vertex_dual(self):
        edges = [(0,1,10), (1,2,11)]
@ -463,8 +494,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 1],
            vertex_dual_2x=[-2, 22, 0],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_unmatched_nonzero_dual(self):
        edges = [(0,1,10), (1,2,11)]
@ -473,8 +504,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 1],
            vertex_dual_2x=[9, 11, 11],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_negative_edge_slack(self):
        edges = [(0,1,10), (1,2,11)]
@ -483,8 +514,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 1],
            vertex_dual_2x=[0, 11, 11],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_matched_edge_slack(self):
        edges = [(0,1,10), (1,2,11)]
@ -493,8 +524,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[-1, 2, 1],
            vertex_dual_2x=[0, 20, 11],
            nontrivial_blossom=[])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_negative_blossom_dual(self):
        #
@ -503,11 +534,8 @@ class TestVerificationFail(unittest.TestCase):
        #    \----8-----/
        #
        edges = [(0,1,7), (0,2,8), (1,2,9), (2,3,6)]
-        blossom = mwmatching._NonTrivialBlossom(
-            subblossoms=[
-                mwmatching._Blossom(0),
-                mwmatching._Blossom(1),
-                mwmatching._Blossom(2)],
+        blossom = NonTrivialBlossom(
+            subblossoms=[Blossom(0), Blossom(1), Blossom(2)],
            edges=[0,2,1])
        for sub in blossom.subblossoms:
            sub.parent = blossom
@ -517,8 +545,8 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[1, 0, 3, 2],
            vertex_dual_2x=[4, 6, 8, 4],
            nontrivial_blossom=[blossom])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)

    def test_blossom_not_full(self):
        #
@ -531,11 +559,8 @@ class TestVerificationFail(unittest.TestCase):
        #    \----2-----/
        #
        edges = [(0,1,7), (0,2,2), (1,2,5), (0,3,8), (1,4,8)]
-        blossom = mwmatching._NonTrivialBlossom(
-            subblossoms=[
-                mwmatching._Blossom(0),
-                mwmatching._Blossom(1),
-                mwmatching._Blossom(2)],
+        blossom = NonTrivialBlossom(
+            subblossoms=[Blossom(0), Blossom(1), Blossom(2)],
            edges=[0,2,1])
        for sub in blossom.subblossoms:
            sub.parent = blossom
@ -545,8 +570,62 @@ class TestVerificationFail(unittest.TestCase):
            vertex_mate=[3, 4, -1, 0, 1],
            vertex_dual_2x=[4, 10, 0, 12, 6],
            nontrivial_blossom=[blossom])
-        with self.assertRaises(mwmatching.MatchingError):
-            mwmatching._verify_optimum(ctx)
+        with self.assertRaises(MatchingError):
+            verify_optimum(ctx)
+
+
+class TestValueRange(unittest.TestCase):
+    """Test graphs that force big values for dual variables or edge slack."""
+
+    def test_big_blossom_dual(self):
+        """Force modified blossom dual close to 2*maxweight."""
+        #
+        #              [3]
+        #              / \
+        #          W-4/   \W
+        #      7     /     \
+        #  [1]-----[2]-----[4]
+        #          |   W-4
+        #         5|
+        #          |   1
+        #         [5]-----[6]
+        #
+        w = 100000
+        pairs = mwm([(1,2,7), (2,3,w-4), (2,4,w-4), (2,5,5), (3,4,w), (5,6,1)])
+        self.assertEqual(pairs, [(1,2), (3,4), (5,6)])
+
+    def test_negative_blossom_dual(self):
+        """Force modified blossom dual close to -maxweight."""
+        #
+        #              [3]
+        #              / \
+        #            5/   \7
+        #       1    /     \
+        #  [1]-----[2]-----[4]
+        #           |   5
+        #          1|
+        #           |   W
+        #          [5]-----[6]
+        #
+        w = 100000
+        pairs = mwm([(1,2,1), (2,3,5), (2,4,5), (2,5,1), (3,4,7), (5,6,w)])
+        self.assertEqual(pairs, [(1,2), (3,4), (5,6)])
+
+    def test_big_edge_slack(self):
+        """Force modified edge slack close to 3*maxweight."""
+        #
+        #      6       W      W-2      5
+        # [1]-----[2]-----[3]-----[4]-----[5]
+        #  |               |
+        #  |1              |1
+        #  |               |
+        # [6]-----[7]-----[8]-----[9]-----[10]
+        #      6       W      W-2      5
+        #
+        w = 100000
+        pairs = mwm([(1,2,6), (1,6,1), (2,3,w), (3,4,w-2), (3,8,1), (4,5,5),
+                     (6,7,6), (7,8,w), (8,9,w-2), (9,10,5)])
+        self.assertEqual(pairs, [(1,6), (2,3), (4,5), (7,8), (9,10)])


 if __name__ == "__main__":
--- a/python/tests/test_datastruct.py
+++ b/python/tests/test_datastruct.py
@ -0,0 +1,525 @@
+"""Unit tests for data structures."""
+
+import random
+import unittest
+
+from mwmatching.datastruct import ConcatenableQueue, PriorityQueue
+
+
+class TestConcatenableQueue(unittest.TestCase):
+    """Test ConcatenableQueue."""
+
+    def _check_tree(self, queue):
+        """Check tree balancing rules and priority info."""
+
+        self.assertIsNone(queue.tree.parent)
+        self.assertIs(queue.tree.owner, queue)
+
+        nodes = [queue.tree]
+        while nodes:
+
+            node = nodes.pop()
+
+            if node is not queue.tree:
+                self.assertIsNone(node.owner)
+
+            if node.height == 0:
+                self.assertEqual(len(node.childs), 0)
+                self.assertIs(node.min_node, node)
+            else:
+                self.assertIn(len(node.childs), (2, 3))
+                best_node = set()
+                best_prio = None
+                for child in node.childs:
+                    self.assertIs(child.parent, node)
+                    self.assertEqual(child.height, node.height - 1)
+                    nodes.append(child)
+                    if ((best_prio is None)
+                            or (child.min_node.prio < best_prio)):
+                        best_node = {child.min_node}
+                        best_prio = child.min_node.prio
+                    elif child.min_node.prio == best_prio:
+                        best_node.add(child.min_node)
+
+                self.assertEqual(node.min_node.prio, best_prio)
+                self.assertIn(node.min_node, best_node)
+
+    def test_single(self):
+        """Single element."""
+        q = ConcatenableQueue("Q")
+
+        with self.assertRaises(Exception):
+            q.min_prio()
+
+        with self.assertRaises(Exception):
+            q.min_elem()
+
+        n = q.insert("a", 4)
+        self.assertIsInstance(n, ConcatenableQueue.Node)
+
+        self._check_tree(q)
+
+        self.assertEqual(n.find(), "Q")
+        self.assertEqual(q.min_prio(), 4)
+        self.assertEqual(q.min_elem(), "a")
+
+        with self.assertRaises(Exception):
+            q.insert("x", 1)
+
+        n.set_prio(8)
+        self._check_tree(q)
+
+        self.assertEqual(n.find(), "Q")
+        self.assertEqual(q.min_prio(), 8)
+        self.assertEqual(q.min_elem(), "a")
+
+        q.clear()
+
+    def test_simple(self):
+        """Simple test, 5 elements."""
+        q1 = ConcatenableQueue("A")
+        n1 = q1.insert("a", 5)
+
+        q2 = ConcatenableQueue("B")
+        n2 = q2.insert("b", 6)
+
+        q3 = ConcatenableQueue("C")
+        n3 = q3.insert("c", 7)
+
+        q4 = ConcatenableQueue("D")
+        n4 = q4.insert("d", 4)
+
+        q5 = ConcatenableQueue("E")
+        n5 = q5.insert("e", 3)
+
+        q345 = ConcatenableQueue("P")
+        q345.merge([q3, q4, q5])
+        self._check_tree(q345)
+
+        self.assertEqual(n1.find(), "A")
+        self.assertEqual(n3.find(), "P")
+        self.assertEqual(n4.find(), "P")
+        self.assertEqual(n5.find(), "P")
+        self.assertEqual(q345.min_prio(), 3)
+        self.assertEqual(q345.min_elem(), "e")
+
+        with self.assertRaises(Exception):
+            q3.min_prio()
+
+        self._check_tree(q345)
+        n5.set_prio(6)
+        self._check_tree(q345)
+
+        self.assertEqual(q345.min_prio(), 4)
+        self.assertEqual(q345.min_elem(), "d")
+
+        q12 = ConcatenableQueue("Q")
+        q12.merge([q1, q2])
+        self._check_tree(q12)
+
+        self.assertEqual(n1.find(), "Q")
+        self.assertEqual(n2.find(), "Q")
+        self.assertEqual(q12.min_prio(), 5)
+        self.assertEqual(q12.min_elem(), "a")
+
+        q12345 = ConcatenableQueue("R")
+        q12345.merge([q12, q345])
+        self._check_tree(q12345)
+
+        self.assertEqual(n1.find(), "R")
+        self.assertEqual(n2.find(), "R")
+        self.assertEqual(n3.find(), "R")
+        self.assertEqual(n4.find(), "R")
+        self.assertEqual(n5.find(), "R")
+        self.assertEqual(q12345.min_prio(), 4)
+        self.assertEqual(q12345.min_elem(), "d")
+
+        n4.set_prio(8)
+        self._check_tree(q12345)
+
+        self.assertEqual(q12345.min_prio(), 5)
+        self.assertEqual(q12345.min_elem(), "a")
+
+        n3.set_prio(2)
+        self._check_tree(q12345)
+
+        self.assertEqual(q12345.min_prio(), 2)
+        self.assertEqual(q12345.min_elem(), "c")
+
+        q12345.split()
+        self._check_tree(q12)
+        self._check_tree(q345)
+
+        self.assertEqual(n1.find(), "Q")
+        self.assertEqual(n2.find(), "Q")
+        self.assertEqual(n3.find(), "P")
+        self.assertEqual(n4.find(), "P")
+        self.assertEqual(n5.find(), "P")
+        self.assertEqual(q12.min_prio(), 5)
+        self.assertEqual(q12.min_elem(), "a")
+        self.assertEqual(q345.min_prio(), 2)
+        self.assertEqual(q345.min_elem(), "c")
+
+        q12.split()
+        self._check_tree(q1)
+        self._check_tree(q2)
+
+        q345.split()
+        self._check_tree(q3)
+        self._check_tree(q4)
+        self._check_tree(q5)
+
+        self.assertEqual(n1.find(), "A")
+        self.assertEqual(n2.find(), "B")
+        self.assertEqual(n3.find(), "C")
+        self.assertEqual(n4.find(), "D")
+        self.assertEqual(n5.find(), "E")
+        self.assertEqual(q3.min_prio(), 2)
+        self.assertEqual(q3.min_elem(), "c")
+
+        q1.clear()
+        q2.clear()
+        q3.clear()
+        q4.clear()
+        q5.clear()
+        q12.clear()
+        q345.clear()
+        q12345.clear()
+
+    def test_medium(self):
+        """Medium test, 14 elements."""
+
+        prios = [3, 8, 6, 2, 9, 4, 6, 8, 1, 5, 9, 4, 7, 8]
+
+        queues = []
+        nodes = []
+        for i in range(14):
+            q = ConcatenableQueue(chr(ord("A") + i))
+            n = q.insert(chr(ord("a") + i), prios[i])
+            queues.append(q)
+            nodes.append(n)
+
+        q = ConcatenableQueue("AB")
+        q.merge(queues[0:2])
+        queues.append(q)
+        self._check_tree(q)
+        self.assertEqual(q.min_prio(), min(prios[0:2]))
+
+        q = ConcatenableQueue("CDE")
+        q.merge(queues[2:5])
+        queues.append(q)
+        self._check_tree(q)
+        self.assertEqual(q.min_prio(), min(prios[2:5]))
+
+        q = ConcatenableQueue("FGHI")
+        q.merge(queues[5:9])
+        queues.append(q)
+        self._check_tree(q)
+        self.assertEqual(q.min_prio(), min(prios[5:9]))
+
+        q = ConcatenableQueue("JKLMN")
+        q.merge(queues[9:14])
+        queues.append(q)
+        self._check_tree(q)
+        self.assertEqual(q.min_prio(), min(prios[9:14]))
+
+        for i in range(0, 2):
+            self.assertEqual(nodes[i].find(), "AB")
+        for i in range(2, 5):
+            self.assertEqual(nodes[i].find(), "CDE")
+        for i in range(5, 9):
+            self.assertEqual(nodes[i].find(), "FGHI")
+        for i in range(9, 14):
+            self.assertEqual(nodes[i].find(), "JKLMN")
+
+        q = ConcatenableQueue("ALL")
+        q.merge(queues[14:18])
+        queues.append(q)
+        self._check_tree(q)
+        self.assertEqual(q.min_prio(), 1)
+        self.assertEqual(q.min_elem(), "i")
+
+        for i in range(14):
+            self.assertEqual(nodes[i].find(), "ALL")
+
+        prios[8] = 5
+        nodes[8].set_prio(prios[8])
+        self.assertEqual(q.min_prio(), 2)
+        self.assertEqual(q.min_elem(), "d")
+
+        q.split()
+
+        for i in range(0, 2):
+            self.assertEqual(nodes[i].find(), "AB")
+        for i in range(2, 5):
+            self.assertEqual(nodes[i].find(), "CDE")
+        for i in range(5, 9):
+            self.assertEqual(nodes[i].find(), "FGHI")
+        for i in range(9, 14):
+            self.assertEqual(nodes[i].find(), "JKLMN")
+
+        self.assertEqual(queues[14].min_prio(), min(prios[0:2]))
+        self.assertEqual(queues[15].min_prio(), min(prios[2:5]))
+        self.assertEqual(queues[16].min_prio(), min(prios[5:9]))
+        self.assertEqual(queues[17].min_prio(), min(prios[9:14]))
+
+        for q in queues[14:18]:
+            self._check_tree(q)
+            q.split()
+
+        for i in range(14):
+            self._check_tree(queues[i])
+            self.assertEqual(nodes[i].find(), chr(ord("A") + i))
+            self.assertEqual(queues[i].min_prio(), prios[i])
+            self.assertEqual(queues[i].min_elem(), chr(ord("a") + i))
+
+        for q in queues:
+            q.clear()
+
+    def test_random(self):
+        """Pseudo-random test."""
+
+        rng = random.Random(23456)
+
+        nodes = []
+        prios = []
+        queues = {}
+        queue_nodes = {}
+        queue_subs = {}
+        live_queues = set()
+        live_merged_queues = set()
+
+        for i in range(4000):
+            name = f"q{i}"
+            q = ConcatenableQueue(name)
+            p = rng.random()
+            n = q.insert(f"n{i}", p)
+            nodes.append(n)
+            prios.append(p)
+            queues[name] = q
+            queue_nodes[name] = {i}
+            live_queues.add(name)
+
+        for i in range(2000):
+
+            for k in range(10):
+                t = rng.randint(0, len(nodes) - 1)
+                name = nodes[t].find()
+                self.assertIn(name, live_queues)
+                self.assertIn(t, queue_nodes[name])
+                p = rng.random()
+                prios[t] = p
+                nodes[t].set_prio(p)
+                pp = min(prios[tt] for tt in queue_nodes[name])
+                tt = prios.index(pp)
+                self.assertEqual(queues[name].min_prio(), pp)
+                self.assertEqual(queues[name].min_elem(), f"n{tt}")
+
+            k = rng.randint(2, max(2, len(live_queues) // 2 - 400))
+            subs = rng.sample(sorted(live_queues), k)
+
+            name = f"Q{i}"
+            q = ConcatenableQueue(name)
+            q.merge([queues[nn] for nn in subs])
+            self._check_tree(q)
+            queues[name] = q
+            queue_nodes[name] = set().union(*(queue_nodes[nn] for nn in subs))
+            queue_subs[name] = set(subs)
+            live_queues.difference_update(subs)
+            live_merged_queues.difference_update(subs)
+            live_queues.add(name)
+            live_merged_queues.add(name)
+
+            pp = min(prios[tt] for tt in queue_nodes[name])
+            tt = prios.index(pp)
+            self.assertEqual(q.min_prio(), pp)
+            self.assertEqual(q.min_elem(), f"n{tt}")
+
+            if len(live_merged_queues) >= 100:
+                name = rng.choice(sorted(live_merged_queues))
+                queues[name].split()
+
+                for nn in queue_subs[name]:
+                    self._check_tree(queues[nn])
+                    pp = min(prios[tt] for tt in queue_nodes[nn])
+                    tt = prios.index(pp)
+                    self.assertEqual(queues[nn].min_prio(), pp)
+                    self.assertEqual(queues[nn].min_elem(), f"n{tt}")
+                    live_queues.add(nn)
+                    if nn in queue_subs:
+                        live_merged_queues.add(nn)
+
+                live_merged_queues.remove(name)
+                live_queues.remove(name)
+
+                del queues[name]
+                del queue_nodes[name]
+                del queue_subs[name]
+
+        for q in queues.values():
+            q.clear()
+
+
+class TestPriorityQueue(unittest.TestCase):
+    """Test PriorityQueue."""
+
+    def test_empty(self):
+        """Empty queue."""
+        q = PriorityQueue()
+        self.assertTrue(q.empty())
+        with self.assertRaises(IndexError):
+            q.find_min()
+
+    def test_single(self):
+        """Single element."""
+        q = PriorityQueue()
+
+        n1 = q.insert(5, "a")
+        self.assertEqual(n1.prio, 5)
+        self.assertEqual(n1.data, "a")
+        self.assertFalse(q.empty())
+        self.assertIs(q.find_min(), n1)
+
+        q.decrease_prio(n1, 3)
+        self.assertEqual(n1.prio, 3)
+        self.assertIs(q.find_min(), n1)
+
+        q.delete(n1)
+        self.assertTrue(q.empty())
+
+    def test_simple(self):
+        """A few elements."""
+        prios = [9, 4, 7, 5, 8, 6, 4, 5, 2, 6]
+        labels = "abcdefghij"
+
+        q = PriorityQueue()
+
+        elems = [q.insert(prio, data) for (prio, data) in zip(prios, labels)]
+        for (n, prio, data) in zip(elems, prios, labels):
+            self.assertEqual(n.prio, prio)
+            self.assertEqual(n.data, data)
+
+        self.assertIs(q.find_min(), elems[8])
+
+        q.decrease_prio(elems[2], 1)
+        self.assertIs(q.find_min(), elems[2])
+
+        q.decrease_prio(elems[4], 3)
+        self.assertIs(q.find_min(), elems[2])
+
+        q.delete(elems[2])
+        self.assertIs(q.find_min(), elems[8])
+
+        q.delete(elems[8])
+        self.assertIs(q.find_min(), elems[4])
+
+        q.delete(elems[4])
+        q.delete(elems[1])
+        self.assertIs(q.find_min(), elems[6])
+
+        q.delete(elems[3])
+        q.delete(elems[9])
+        self.assertIs(q.find_min(), elems[6])
+
+        q.delete(elems[6])
+        self.assertIs(q.find_min(), elems[7])
+
+        q.delete(elems[7])
+        self.assertIs(q.find_min(), elems[5])
+
+        self.assertFalse(q.empty())
+        q.clear()
+        self.assertTrue(q.empty())
+
+    def test_increase_prio(self):
+        """Increase priority of existing element."""
+
+        q = PriorityQueue()
+
+        n1 = q.insert(5, "a")
+        q.increase_prio(n1, 8)
+        self.assertEqual(n1.prio, 8)
+        self.assertIs(q.find_min(), n1)
+
+        q = PriorityQueue()
+        n1 = q.insert(9, "a")
+        n2 = q.insert(4, "b")
+        n3 = q.insert(7, "c")
+        n4 = q.insert(5, "d")
+        self.assertIs(q.find_min(), n2)
+
+        q.increase_prio(n2, 8)
+        self.assertEqual(n2.prio, 8)
+        self.assertIs(q.find_min(), n4)
+
+        q.increase_prio(n3, 10)
+        self.assertEqual(n3.prio, 10)
+        self.assertIs(q.find_min(), n4)
+
+        q.delete(n4)
+        self.assertIs(q.find_min(), n2)
+
+        q.delete(n2)
+        self.assertIs(q.find_min(), n1)
+
+        q.delete(n1)
+        self.assertIs(q.find_min(), n3)
+        self.assertEqual(n3.prio, 10)
+
+        q.delete(n3)
+        self.assertTrue(q.empty())
+
+    def test_random(self):
+        """Pseudo-random test."""
+        rng = random.Random(34567)
+
+        num_elem = 1000
+
+        seq = 0
+        elems = []
+        q = PriorityQueue()
+
+        def check():
+            min_prio = min(prio for (n, prio, data) in elems)
+            m = q.find_min()
+            self.assertIn((m, m.prio, m.data), elems)
+            self.assertEqual(m.prio, min_prio)
+
+        for i in range(num_elem):
+            seq += 1
+            prio = rng.randint(0, 1000000)
+            elems.append((q.insert(prio, seq), prio, seq))
+            check()
+
+        for i in range(10000):
+            p = rng.randint(0, num_elem - 1)
+            prio = rng.randint(0, 1000000)
+            if prio <= elems[p][1]:
+                q.decrease_prio(elems[p][0], prio)
+            else:
+                q.increase_prio(elems[p][0], prio)
+            elems[p] = (elems[p][0], prio, elems[p][2])
+            check()
+
+            p = rng.randint(0, num_elem - 1)
+            q.delete(elems[p][0])
+            elems.pop(p)
+            check()
+
+            seq += 1
+            prio = rng.randint(0, 1000000)
+            elems.append((q.insert(prio, seq), prio, seq))
+            check()
+
+        for i in range(num_elem):
+            p = rng.randint(0, num_elem - 1 - i)
+            q.delete(elems[p][0])
+            elems.pop(p)
+            if elems:
+                check()
+
+        self.assertTrue(q.empty())
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/run_checks.sh
+++ b/run_checks.sh
@ -1,22 +0,0 @@
-#!/bin/sh
-
-set -e
-
-echo
-echo "Running mypy"
-mypy --disallow-incomplete-defs python tests
-
-echo
-echo "Running pylint"
-pylint --ignore=test_mwmatching.py python tests
-
-echo
-echo "Running test_mwmatching.py"
-python3 python/test_mwmatching.py
-
-echo
-echo "Checking test coverage"
-coverage erase
-coverage run --branch python/test_mwmatching.py
-coverage report -m
-
--- a/run_tests.sh
+++ b/run_tests.sh
@ -1,57 +0,0 @@
-#!/bin/sh
-
-set -e
-
-echo
-echo "Running Python unit tests"
-echo
-
-python3 --version
-echo
-
-python3 python/test_mwmatching.py
-
-echo
-echo "Checking test coverage"
-echo
-
-coverage erase
-coverage run --branch python/test_mwmatching.py
-coverage report -m
-
-echo
-echo "Running Python code on test graphs"
-echo
-
-python3 python/run_matching.py --verify \
-    tests/graphs/chain_n1000.edge \
-    tests/graphs/chain_n5000.edge \
-    tests/graphs/sparse_delta_n1004.edge \
-    tests/graphs/triangles_n1002.edge \
-    tests/graphs/triangles_n5001.edge \
-    tests/graphs/random_n1000_m10000.edge \
-    tests/graphs/random_n2000_m10000.edge
-
-echo
-echo "Running C++ unit tests"
-echo
-
-g++ --version
-echo
-
-make -C cpp run_matching test_mwmatching
-cpp/test_mwmatching
-
-echo
-echo "Running C++ code on test graphs"
-echo
-
-python3 tests/run_test.py --solver cpp/run_matching --verify \
-    tests/graphs/chain_n1000.edge \
-    tests/graphs/chain_n5000.edge \
-    tests/graphs/sparse_delta_n1004.edge \
-    tests/graphs/triangles_n1002.edge \
-    tests/graphs/triangles_n5001.edge \
-    tests/graphs/random_n1000_m10000.edge \
-    tests/graphs/random_n2000_m10000.edge
-
--- a/test_cpp.sh
+++ b/test_cpp.sh
@ -0,0 +1,35 @@
+#!/bin/sh
+
+set -e
+
+echo
+echo ">> Running unit tests"
+echo
+
+g++ --version
+echo
+
+make -C cpp run_matching test_mwmatching
+cpp/test_mwmatching
+
+echo
+echo ">> Running test graphs"
+echo
+
+python3 tests/run_test.py --solver cpp/run_matching --verify \
+    tests/graphs/chain_n1000.edge \
+    tests/graphs/chain_n5000.edge \
+    tests/graphs/chain_n10000.edge \
+    tests/graphs/sparse_delta_n1004.edge \
+    tests/graphs/sparse_delta_n2004.edge \
+    tests/graphs/sparse_delta_n5004.edge \
+    tests/graphs/triangles_n1002.edge \
+    tests/graphs/triangles_n5001.edge \
+    tests/graphs/triangles_n10002.edge \
+    tests/graphs/random_n1000_m10000.edge \
+    tests/graphs/random_n2000_m10000.edge \
+    tests/graphs/random_n4000_m10000.edge
+
+echo
+echo ">> Done"
+
--- a/test_python.sh
+++ b/test_python.sh
@ -0,0 +1,47 @@
+#!/bin/bash
+
+set -e
+
+python3 --version
+
+echo
+echo ">> Running pycodestyle"
+pycodestyle python/mwmatching python/run_matching.py tests
+
+echo
+echo ">> Running mypy"
+mypy --disallow-incomplete-defs python tests
+
+echo
+echo ">> Running pylint"
+pylint --ignore=test_algorithm.py python tests/*.py tests/generate/*.py || [ $(($? & 3)) -eq 0 ]
+
+echo
+echo ">> Running unit tests"
+python3 -m unittest discover -t python -s python/tests
+
+echo
+echo ">> Checking test coverage"
+coverage erase
+coverage run --branch -m unittest discover -t python -s python/tests
+coverage report -m
+
+echo
+echo ">> Running test graphs"
+python3 python/run_matching.py --verify \
+    tests/graphs/chain_n1000.edge \
+    tests/graphs/chain_n5000.edge \
+    tests/graphs/chain_n10000.edge \
+    tests/graphs/sparse_delta_n1004.edge \
+    tests/graphs/sparse_delta_n2004.edge \
+    tests/graphs/sparse_delta_n5004.edge \
+    tests/graphs/triangles_n1002.edge \
+    tests/graphs/triangles_n5001.edge \
+    tests/graphs/triangles_n10002.edge \
+    tests/graphs/random_n1000_m10000.edge \
+    tests/graphs/random_n2000_m10000.edge \
+    tests/graphs/random_n4000_m10000.edge
+
+echo
+echo ">> Done"
+
--- a/tests/generate/make_random_graph.py
+++ b/tests/generate/make_random_graph.py
@ -14,7 +14,7 @@ from typing import TextIO

 def write_dimacs_graph(
        f: TextIO,
-        edges: list[tuple[int, int, int|float]]
+        edges: list[tuple[int, int, float]]
        ) -> None:
    """Write a graph in DIMACS edge list format."""

@ -38,7 +38,7 @@ def make_random_graph(
        max_weight: float,
        float_weights: bool,
        rng: random.Random
-        ) -> list[tuple[int, int, int|float]]:
+        ) -> list[tuple[int, int, float]]:
    """Generate a random graph with random edge weights."""

    edge_set: set[tuple[int, int]] = set()
@ -59,9 +59,9 @@ def make_random_graph(
        rng.shuffle(edge_candidates)
        edge_set.update(edge_candidates[:m])

-    edges: list[tuple[int, int, int|float]] = []
+    edges: list[tuple[int, int, float]] = []
    for (x, y) in sorted(edge_set):
-        w: int|float
+        w: float
        if float_weights:
            w = rng.uniform(1.0e-8, max_weight)
        else:
--- a/tests/generate/make_slow_graph.py
+++ b/tests/generate/make_slow_graph.py
@ -19,26 +19,25 @@ count_delta_step = [0]

 def patch_matching_code() -> None:
    """Patch the matching code to count events."""
-    # pylint: disable=import-outside-toplevel,protected-access
+    # pylint: disable=import-outside-toplevel

-    import mwmatching
+    from mwmatching.algorithm import MatchingContext

-    orig_make_blossom = mwmatching._MatchingContext.make_blossom
-    orig_substage_calc_dual_delta = (
-        mwmatching._MatchingContext.substage_calc_dual_delta)
+    orig_make_blossom = MatchingContext.make_blossom
+    orig_calc_dual_delta_step = MatchingContext.calc_dual_delta_step

    def stub_make_blossom(*args: Any, **kwargs: Any) -> Any:
        count_make_blossom[0] += 1
        return orig_make_blossom(*args, **kwargs)

-    def stub_substage_calc_dual_delta(*args: Any, **kwargs: Any) -> Any:
+    def stub_calc_dual_delta_step(*args: Any, **kwargs: Any) -> Any:
        count_delta_step[0] += 1
-        ret = orig_substage_calc_dual_delta(*args, **kwargs)
+        ret = orig_calc_dual_delta_step(*args, **kwargs)
        return ret

-    mwmatching._MatchingContext.make_blossom = stub_make_blossom  # type: ignore
-    mwmatching._MatchingContext.substage_calc_dual_delta = (  # type: ignore
-        stub_substage_calc_dual_delta)
+    MatchingContext.make_blossom = stub_make_blossom  # type: ignore
+    MatchingContext.calc_dual_delta_step = (  # type: ignore
+        stub_calc_dual_delta_step)


 def run_max_weight_matching(
--- a/tests/run_benchmark.sh
+++ b/tests/run_benchmark.sh
@ -0,0 +1,51 @@
+#!/bin/bash
+
+PYMWM=../python/run_matching.py
+CPPMWM=../cpp/run_matching
+LEMON=lemon/lemon_matching
+
+# Generate big graphs that are not in the Git repo.
+python3 generate/make_slow_graph.py chain 20000 > graphs/chain_n20000.edge
+python3 generate/make_slow_graph.py chain 50000 > graphs/chain_n50000.edge
+python3 generate/triangles.py 16667 1 > graphs/triangles_n50001.edge
+
+# Run pre-generated graphs.
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --runs 5 --verify \
+  graphs/chain_n1000.edge \
+  graphs/chain_n5000.edge \
+  graphs/chain_n10000.edge \
+  graphs/sparse_delta_n1004.edge \
+  graphs/sparse_delta_n2004.edge \
+  graphs/sparse_delta_n5004.edge \
+  graphs/triangles_n1002.edge \
+  graphs/triangles_n5001.edge \
+  graphs/triangles_n10002.edge
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --runs 5 \
+  graphs/chain_n20000.edge \
+  graphs/chain_n50000.edge \
+  graphs/triangles_n50001.edge
+
+# Run random graphs.
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 1000 --m 5000 
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 1000 --m 31622
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 1000 --m 499500
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 2000 --m 10000
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 2000 --m 89442
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 2000 --m 1999000
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 5000 --m 25000
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 5000 --m 353553
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 10000 --m 50000
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 10000 --m 1000000
+
+./run_test.py --solver $PYMWM --solver $CPPMWM --solver $LEMON --timeout 600 --random --runs 25 --n 20000 --m 100000
+
--- a/tests/run_test.py
+++ b/tests/run_test.py
@ -29,7 +29,7 @@ class SolverError(Exception):

 class Graph(NamedTuple):
    """Represents a graph. Vertex indices start from 0."""
-    edges: list[tuple[int, int, int|float]]
+    edges: list[tuple[int, int, float]]

    def num_vertex(self) -> int:
        """Count number of vertices."""
@ -55,11 +55,11 @@ class RunStatus(enum.IntEnum):
 class RunResult(NamedTuple):
    """Represent the result of running a solver on a graph."""
    status: RunStatus = RunStatus.OK
-    weight: int|float = 0
+    weight: float = 0
    run_time: Sequence[float] = ()


-def parse_int_or_float(s: str) -> int|float:
+def parse_int_or_float(s: str) -> float:
    """Convert a string to integer or float value."""
    try:
        return int(s)
@ -130,11 +130,11 @@ def write_dimacs_graph(f: TextIO, graph: Graph) -> None:
            print(f"e {x+1} {y+1} {w:.12g}", file=f)


-def read_dimacs_matching(f: TextIO) -> tuple[int|float, Matching]:
+def read_dimacs_matching(f: TextIO) -> tuple[float, Matching]:
    """Read a matching solution in DIMACS format."""

    have_weight = False
-    weight: int|float = 0
+    weight: float = 0
    pairs: list[tuple[int, int]] = []

    for line in f:
@ -208,9 +208,9 @@ def make_random_graph(
        rng.shuffle(edge_candidates)
        edge_set.update(edge_candidates[:m])

-    edges: list[tuple[int, int, int|float]] = []
+    edges: list[tuple[int, int, float]] = []
    for (x, y) in sorted(edge_set):
-        w: int|float
+        w: float
        if float_weights:
            w = rng.uniform(1.0e-8, max_weight)
        else:
@ -220,14 +220,14 @@ def make_random_graph(
    return Graph(edges)


-def check_matching(graph: Graph, matching: Matching) -> int|float:
+def check_matching(graph: Graph, matching: Matching) -> float:
    """Verify that the matching is valid and calculate its weight."""

-    edge_map: dict[tuple[int, int], int|float] = {}
+    edge_map: dict[tuple[int, int], float] = {}
    for (x, y, w) in graph.edges:
        edge_map[(min(x, y), max(x, y))] = w

-    weight: int|float = 0
+    weight: float = 0
    nodes_used: set[int] = set()

    for pair in matching.pairs:
@ -247,7 +247,7 @@ def check_matching(graph: Graph, matching: Matching) -> int|float:
    return weight


-def compare_weight(weight1: int|float, weight2: int|float) -> int:
+def compare_weight(weight1: float, weight2: float) -> int:
    """Compare weights of matchings.

    Returns:
@ -406,7 +406,7 @@ class WmatchSolver(Solver):
        num_edge = len(graph.edges)

        all_integer = True
-        adjacent: list[list[tuple[int, int|float]]] = [
+        adjacent: list[list[tuple[int, float]]] = [
            [] for i in range(num_vertex)]

        for (x, y, w) in graph.edges:
@ -451,13 +451,13 @@ def test_solver_on_graph(
        solver: Solver,
        graph: Graph,
        graph_desc: str,
-        gold_weight: Optional[int|float],
+        gold_weight: Optional[float],
        num_run: int
        ) -> RunResult:
    """Test the specified solver with the specified graph."""

    solver_run_time: list[float] = []
-    solver_weight: Optional[int|float] = None
+    solver_weight: Optional[float] = None

    for i in range(num_run):

@ -507,7 +507,7 @@ def test_graph(
        solvers: list[Solver],
        graph: Graph,
        graph_desc: str,
-        gold_weight: Optional[int|float],
+        gold_weight: Optional[float],
        num_run: int
        ) -> list[RunResult]:
    """Test all solvers with the specified graph."""
@ -525,7 +525,7 @@ def test_graph(

    if gold_weight is None:

-        best_weight: Optional[int|float] = None
+        best_weight: Optional[float] = None
        for result in results:
            if result.status == RunStatus.OK:
                if (best_weight is None) or (result.weight > best_weight):
@ -668,7 +668,7 @@ def test_input(
                  file=sys.stderr)
            return 1

-        gold_weight: Optional[int|float] = None
+        gold_weight: Optional[float] = None
        if verify:
            reffile = os.path.splitext(filename)[0] + ".out"
            try:
@ -740,7 +740,8 @@ def main() -> int:
    parser.add_argument("--timeout",
                        action="store",
                        type=float,
-                        help="abort when solver runs longer than TIMEOUT seconds")
+                        help="abort when solver runs longer than TIMEOUT"
+                             " seconds")
    parser.add_argument("--runs",
                        action="store",
                        type=int,
--- a/tox.ini
+++ b/tox.ini
@ -0,0 +1,50 @@
+
+[tox]
+skipsdist = true
+env_list =
+    static
+    coverage
+    py37
+    py38
+    py39
+    py310
+    py311
+    py312
+    pypy3
+
+[testenv]
+commands =
+    python --version
+    python -m unittest discover -t python -s python/tests
+    python python/run_matching.py --verify \
+        tests/graphs/chain_n1000.edge \
+        tests/graphs/chain_n5000.edge \
+        tests/graphs/chain_n10000.edge \
+        tests/graphs/sparse_delta_n1004.edge \
+        tests/graphs/sparse_delta_n2004.edge \
+        tests/graphs/sparse_delta_n5004.edge \
+        tests/graphs/triangles_n1002.edge \
+        tests/graphs/triangles_n5001.edge \
+        tests/graphs/triangles_n10002.edge \
+        tests/graphs/random_n1000_m10000.edge \
+        tests/graphs/random_n2000_m10000.edge \
+        tests/graphs/random_n4000_m10000.edge
+
+[testenv:static]
+deps =
+    mypy
+    pycodestyle
+    pylint
+commands =
+    pycodestyle python/mwmatching python/run_matching.py tests
+    mypy --disallow-incomplete-defs python tests
+    pylint --ignore=test_algorithm.py python tests/*.py tests/generate/*.py
+
+[testenv:coverage]
+deps =
+    coverage
+commands =
+    coverage erase
+    coverage run --branch -m unittest discover -t python -s python/tests
+    coverage report -m
+
Author	SHA1	Message	Date
Joris van Rantwijk	e103a493fc	Update Algorithm.md	2024-07-28 11:38:05 +02:00
Joris van Rantwijk	4670cf1dca	Add testcases to force big values for dual/slack	2024-07-28 11:38:05 +02:00
Joris van Rantwijk	c731c32473	Separate function top_level_blossom()	2024-07-21 15:32:41 +02:00
Joris van Rantwijk	e8490010d6	Implement ConcatenableQueue as 2-3 tree	2024-07-21 15:32:41 +02:00
Joris van Rantwijk	dc8cdae225	Minor simplification in UnionFindQueue	2024-07-21 15:32:41 +02:00
Joris van Rantwijk	e2f5b63a01	Fix README	2024-07-11 21:28:41 +02:00
Joris van Rantwijk	6bf04df77b	Update README	2024-07-11 21:27:01 +02:00
Joris van Rantwijk	99f8a2d822	Add benchmark script	2024-07-10 21:01:21 +02:00
Joris van Rantwijk	ed70402310	Separate C++ test script	2024-07-10 20:52:59 +02:00
Joris van Rantwijk	54f59db753	Add tox.ini for Python testing	2024-07-10 20:19:12 +02:00
Joris van Rantwijk	c58374e6fb	Remove debug checks of alternating tree	2024-07-09 21:18:53 +02:00
Joris van Rantwijk	658a393bb8	Test script for Python code	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	c19fa9a76c	Add pyproject.toml	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	d3475834ab	Add from __future__ import annotations This makes the code work on Python versions 3.7 and 3.8.	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	147640329f	Restructure Python code as package	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	f2e8ca1357	Remove redundant type annotations "int\|float"	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	b2d4de41f9	Add __slots__ in datastruct.py It does not make a clear difference for performance. But it should at least reduce memory usage a bit.	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	d2debb6d6f	Minor changes to docstrings	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	bbe19a6798	Run all unittests in run_checks.sh	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	50ef772271	Improve test coverage	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	6a75ffaf63	Avoid leaking reference cycles	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	4c6115fb2f	Improve comments and docstrings	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	0e76e6472b	Minor cleanup in scanning and delta steps	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	f35a640e43	Clean up management of the alternating tree	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	b960a85b6c	Clean up magagement of blossom labels	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	f8c6b99842	Clean up least-slack edge tracking	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	1a98624f2b	Solve slow maintanance of blossom list	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	61524990d7	Keep alternating trees between stages Delete only the trees that are involved in an augmenting path. Keep the other trees and reuse them in the next stage. This gives a big speedup on many cases such as random graphs. The code is a mess, needs to be cleaned up.	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	73641d7b70	Add method PriorityQueue.increase_prio	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	0675230692	Code style cleanups	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	de30ac3c5e	Track blossoms in each alternating tree	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	aab2acd78e	Remove redundant clearing of scan queue	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	e9baa88c70	Fix bug in delta2 tracking	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	73479532ac	Implement scan queue as a list instead of deque	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	04b6908449	Do not check edge slack during scan Tight edges are not used immediately during the scan. Just like other edges, tight edges are tracked in priority queues and are used later through a zero-delta step. This simplifies slack calculations.	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	3a77749425	Simplify slack handling in delta2 tracking	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	9ee26584ab	Use UnionFind to find top-level blossom of vertex The run time should now be O(nmlog(n))	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	225311dae0	Implement heap-based tracking for delta2	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	7cc1666cf2	Lazy delta updates of T-vertex duals	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	6318de3b1f	Lazy delta updates of T-blossom duals	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	b2e055b357	Lazy delta updates of S-blossom duals	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	de03796d99	Lazy delta updates of S-vertex duals	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	a23c38eb70	Implement heap-based tracking for delta4	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	13b6b76d47	Implement heap-based edge tracking for delta3	2024-07-09 21:10:38 +02:00
Joris van Rantwijk	91d4afb271	Clean up redundant type annotations "int\|float"	2024-07-09 21:10:00 +02:00
Joris van Rantwijk	c8a3f7f684	Datastructures for O(nmlog(n)) algorithm	2024-05-25 11:50:07 +02:00