diff --git a/doc/Algorithm.md b/doc/Algorithm.md index 09cd83d..1d08728 100644 --- a/doc/Algorithm.md +++ b/doc/Algorithm.md @@ -4,19 +4,20 @@ ## Introduction This document describes the implementation of an algorithm that computes a maximum weight matching -in a general graph in time _O(n3)_, where _n_ is the number of vertices in the graph. +in a general graph in time _O(n (n + m) log n)_, where _n_ is the number of vertices in +the graph and _m_ is the number of edges. In graph theory, a _matching_ is a subset of edges such that none of them share a common vertex. A _maximum cardinality matching_ is a matching that contains the largest possible number of edges (or equivalently, the largest possible number of vertices). -For a graph that has weights attached to its edges, a _maximum weight matching_ +If a graph has weights assigned to its edges, a _maximum weight matching_ is a matching that achieves the largest possible sum of weights of matched edges. An algorithm for maximum weight matching can obviously also be used to calculate a maximum cardinality matching by simply assigning weight 1 to all edges. -Certain computer science problems can be understood as _restrictions_ of maximum weighted matching +Certain related problems can be understood as _restrictions_ of maximum weighted matching in general graphs. Examples are: maximum matching in bipartite graphs, maximum cardinality matching in general graphs, and maximum weighted matching in general graphs with edge weights limited to integers @@ -48,7 +49,7 @@ It is based on the ideas of Edmonds, but uses different data structures to reduc of work. In 1983, Galil, Micali and Gabow published a maximum weighted matching algorithm that runs in -time _O(n\*m\*log(n))_ [[4]](#galil_micali_gabow1986) . +time _O(n m log n)_ [[4]](#galil_micali_gabow1986) . It is an implementation of Edmonds' blossom algorithm that uses advanced data structures to speed up critical parts of the algorithm. This algorithm is asymptotically faster than _O(n3)_ for sparse graphs, @@ -63,45 +64,32 @@ of the literature. The paper describes a maximum weighted matching algorithm that is similar to Edmonds' blossom algorithm, but carefully implemented to run in time _O(n3)_. It then sketches how advanced data structures can be added to arrive at the Galil-Micali-Gabow -algorithm that runs in time _O(n\*m\*log(n))_. +algorithm that runs in time _O(n m log n)_. In 1990, Gabow published a maximum weighted matching algorithm that runs in time -_O(n\*m + n2\*log(n))_ [[6]](#gabow1990) . +_O(n m + n2 log n)_ [[6]](#gabow1990) . It uses several advanced data structures, including Fibonacci heaps. Unfortunately I don't understand this algorithm at all. ## Choosing an algorithm -I selected the _O(n3)_ variant of Edmonds' blossom algorithm as described by -Galil [[5]](#galil1986) . -This algorithm is usually credited to Gabow [[3]](#gabow1974), but I find the description -in [[5]](#galil1986) easier to understand. +I selected the _O(n m log n)_ algorithm by Galil, Micali and Gabow +[[4]](#galil_micali_gabow1986). -This is generally a fine algorithm. -One of its strengths is that it is relatively easy to understand, especially compared -to the more recent algorithms. -Its run time is asymptotically optimal for complete graphs (graphs that have an edge -between every pair of vertices). - -On the other hand, the algorithm is suboptimal for sparse graphs. -It is possible to construct highly sparse graphs, having _m = O(n)_, -that cause this algorithm to perform _Θ(n3)_ steps. -In such cases the Galil-Micali-Gabow algorithm would be significantly faster. - -Then again, there is a question of how important the worst case is for practical applications. -I suspect that the simple algorithm typically needs only about _O(n\*m)_ steps when running -on random sparse graphs with random weights, i.e. much faster than its worst case bound. - -After trading off these properties, I decided that I prefer an algorithm that is understandable -and has decent performance, over an algorithm that is faster in specific cases but also -significantly more complicated. +This algorithm is asymptotically optimal for sparse graphs. +It has also been shown to be quite fast in practice on several types of graphs +including random graphs [[7]](#mehlhorn_schafer2002). +This algorithm is more difficult to implement than the older _O(n3)_ algorithm. +In particular, it requires a specialized data structure to implement mergeable priority queues. +This increases the size and complexity of the code quite a bit. +However, in my opinion the performance improvement is worth the extra effort. ## Description of the algorithm -My implementation closely follows the description by Zvi Galil in [[5]](#galil1986) . -I recommend to read that paper before diving into my description below. +My implementation roughly follows the description by Zvi Galil in [[5]](#galil1986) . +I recommend reading that paper before diving into my description below. The paper explains the algorithm in depth and shows how it relates to matching in bipartite graphs and non-weighted graphs. @@ -260,7 +248,7 @@ How this works will be explained later. To find an augmenting path, the algorithm searches for alternating paths that start in an unmatched vertex. -The collection of alternating paths forms a forest of trees. +The collection of such alternating paths forms a forest of trees. Each tree is rooted in an unmatched vertex, and all paths from the root to the leaves of a tree are alternating paths. The nodes in these trees are top-level blossoms. @@ -314,7 +302,7 @@ The search procedure considers these vertices one-by-one and tries to use them t either grow the alternating tree (thus adding new vertices to the queue), or discover an augmenting path or a new blossom. -In detail, the search for an augmenting path proceeds as follows: +The search for an augmenting path proceeds as follows: - Mark all top-level blossoms as _unlabeled_. - Initialize an empty queue _Q_. @@ -537,8 +525,9 @@ Dual variables of unlabeled blossoms and their vertices remain unchanged. Dual variables _zB_ of non-trivial sub-blossoms also remain changed; only top-level blossoms have their _zB_ updated. -Note that this update does not change the slack of edges that are either matched, -or linked in the alternating tree, or contained in a blossom. +Note that these rules ensure that no change occurs to the slack of any edge which is matched, +or part of an alternating tree, or contained in a blossom. +Such edges are tight and remain tight through the update. However, the update reduces the slack of edges between S blossoms and edges between S-blossoms and unlabeled blossoms. It may cause some of these edges to become tight, allowing them to be used @@ -569,25 +558,62 @@ If the dual update is limited by _δ4_, it causes the dual varia a T-blossom to become zero. The next step is to expand that blossom. -A dual update may find that _δ = 0_, implying that the dual variables don't change. +### Discovering tight edges through delta steps + +A delta step may find that _δ = 0_, implying that the dual variables don't change. This can still be useful since all types of updates have side effects (adding an edge to an alternating tree, or expanding a blossom) that allow the algorithm to make progress. +In fact, it is convenient to let the dual update mechanism drive the entire process of discovering +tight edges and growing alternating trees. -During a single _stage_, the algorithm may iterate several times between scanning tight edges and -updating dual variables. -These iterations are called _substages_. -To clarify: A stage is the process of growing alternating trees until an augmenting path is found. +In my description of the search algorithm above, I stated that a tight edge between +a newly labeled S-vertex and an unlabeled vertex or a different S-blossom should be used to +grow the alternating tree or to create a new blossom or to form an augmenting path. +However, it turns out to be easier to postpone the use of such edges until the next delta step. +While scanning newly labeled S-vertices, edges to unlabeled vertices or different S-blossoms +are discovered but not yet used. +Such edges will merely be indexed in a suitable data structure. +Even if the edge is tight, it will be indexed rather than used right away. + +Once the scan completes, a delta step will be done. +If any tight edges were discovered during the scan, the delta step will find that either +_δ2 = 0_ or _δ3 = 0_. +The corresponding step (growing the alternating tree, creating a blossom or augmenting +the matching) will occur at that point. +If no suitable tight edges exist, a real change of dual variables will occur. + +The search for an augmenting path becomes as follows: + + - Mark all top-level blossoms as _unlabeled_. + - Initialize an empty queue _Q_. + - Assign label S to all top-level blossoms that contain an unmatched vertex. + Add all vertices inside such blossoms to _Q_. + - Repeat until either an augmenting path is found or _δ1 = 0_: + - Scan all vertices in Q as described earlier. + Build an index of edges to unlabeled vertices or other S-blossoms. + Do not yet use such edges to change the alternating tree, even if the edge is tight. + - Calculate _δ_ and update dual variables as described above. + - If _δ = δ1_, end the search. + The maximum weight matching has been found. + - If _δ = δ2_, use the corresponding edge to grow the alternating tree. + Assign label T to the unlabeled blossom. + Then assign label S to its mate and add the new S-vertices to _Q_. + - If _δ = δ3_ and the corresponding edge connects two S-blossoms + in the same alternating tree, use the edge to create a new blossom. + Add the new S-vertices to _Q_. + - If _δ = δ3_ and the corresponding edge connects two S-blossoms + in different alternating trees, use the edge to construct an augmenting path. + End the search and return the augmenting path. + - If _δ = δ4_, expand the corresponding T-blossom. + +It may seem complicated, but this is actually easier. +The code that scans newly labeled S-vertices, no longer needs to treat tight edges specially. + +In general, multiple updates of the dual variables are necessary during a single _stage_ of +the algorithm. +Remember that a stage is the process of growing alternating trees until an augmenting path is found. A stage ends either by augmenting the matching, or by concluding that no further improvement is possible. -Each stage consists of one or more substages. -A substage scans tight edges to grow the alternating trees. -When a substage runs out of tight edges, it ends by performing a dual variable update. -A substage also ends immediately when it finds an augmenting path. -At the end of each stage, labels and alternating trees are removed. - -The matching algorithm ends when a substage ends in a dual variable update limited -by _δ1_. -At that point the matching has maximum weight. ### Expanding a blossom @@ -635,161 +661,323 @@ All vertices of sub-blossoms that got label S are inserted into _Q_. ![Figure 5](figures/graph5.png)
*Figure 5: Expanding a T-blossom* -### Keeping track of least-slack edges +### Keeping track of the top-level blossom of each vertex -To perform a dual variable update, the algorithm needs to compute the values +The algorithm often needs to find the top-level blossom _B(x)_ that contains a given vertex _x_. + +A naive implementation may keep this information is an array where the element with +index _x_ holds a pointer to blossom _B(x)_. +Lookup in this array would be fast, but keeping the array up-to-date takes too much time. +There can be _O(n)_ stages, and _O(n)_ blossoms can be created or expanded during a stage, +and a blossom can contain _O(n)_ vertices, +therefore the total number of updates to the array could add up to _O(n3)_. + +To solve this, we use a special data structure: a _concatenable priority queue_. +Each top-level blossom maintains an instance of this type of queue, containing its vertices. +Each vertex is a member in precisely one such queue. + +To find the top-level blossom _B(x)_ that contains a given vertex _x_, we determine +the queue instance in which the vertex is a member. +This takes time _O(log n)_. +The queue instance corresponds directly to a specific blossom, which we can find +for example by storing a pointer to the blossom inside the queue instance. + +When a new blossom is created, the concatenable queues of its sub-blossoms are merged +to form one concatenable queue for the new blossom. +Concatenating two queues produces a new queue that contains all members of the original queues. +This operation takes time _O(log n)_. +To merge the queues of _k_ sub-blossoms, the concatenation step is repeated _k-1_ times, +taking total time _O(k log n)_. + +When a blossom is expanded, its concatenable queue is un-concatenated to recover separate queues +for the sub-blossoms. +This also takes time _O(log n)_ for each sub-blossom. + +Implementation details of a concatenable queue will be discussed later in this document. + +### Lazy updating of dual variables + +During a delta step, the dual variables of labeled vertices and blossoms change as described above. +Updating these variables directly would take time _O(n)_ per delta step. +The total number of delta steps during a matching may be _Θ(n2)_, +pushing the total time to update dual variables to _O(n3)_ which is too slow. + +To solve this, [[4]](#galil_micali_gabow1986) describes a technique that stores dual values +in a _modified_ form which is invariant under delta steps. +The modified values can be converted back to the true dual values when necessary. +[[7]](#mehlhorn_schafer2002) describes a slightly different technique which I find easier +to understand. +My implementation is very similar to theirs. + +The first trick is to keep track of the running sum of _δ_ values since the beginning of the algorithm. +Let's call that number _Δ_. +At the start of the algorithm _Δ = 0_, but the value increases as the algorithm goes through delta steps. + +For each non-trivial blossom, rather than storing its true dual value, we store a _modified_ dual value: + + - For an S-blossom, the modified dual value is _z'B = zB - 2 Δ_ + - For a T-blossom, the modified dual value is _z'B = zB + 2 Δ_ + - For an unlabeled blossom or non-top-level blossom, the modified dual value is equal + to the true dual value. + +These modified values are invariant under delta steps. +Thus, there is no need to update the stored values during a delta step. + +Since the modified blossom dual value depends on the label (S or T) of the blossom, +the modified value must be updated whenever the label of the blossom changes. +This update can be done in constant time, and changing the label of a blossom is +in any case an explicit step, so this won't increase the asymptotic run time. + +For each vertex, rather than storing its true dual value, we store a _modified_ dual value: + + - For an S-vertex, the modified dual value is _u'x = ux + Δ_ + - For a T-vertex, the modified dual value is _u'x = ux - offsetB(x) - Δ_ + - For an unlabeled vertex, the modified dual value is _u'x = ux - offsetB(x)_ + +where _offsetB_ is an extra variable which is maintained for each top-level blossom. + +Again, the modified values are invariant under delta steps, which implies that no update +to the stored values is necessary during a delta step. + +Since the modified vertex dual value depends on the label (S or T) of its top-level blossom, +an update is necessary when that label changes. +For S-vertices, we can afford to apply that update directly to the vertices involved. +This is possible since a vertex becomes an S-vertex at most once per stage. + +The situation is more complicated for T-vertices. +During a stage, a T-vertex can become unlabeled if it is part of a T-blossom that gets expanded. +The same vertex can again become a T-vertex, then again become unlabeled during a subsequent +blossom expansion. +In this way, a vertex can transition between T-vertex and unlabeled vertex up to _O(n)_ times +within a stage. +We can not afford to update the stored modified vertex dual so many times. +This is where the _offset_ variables come in. +If a blossom becomes a T-blossom, rather than updating the modified duals of all vertices, +we update only the _offset_ variable of the blossom such that the modified vertex duals +remain unchanged. +If a blossom is expanded, we push the _offset_ values down to its sub-blossoms. + +### Efficiently computing _δ_ + +To perform a delta step, the algorithm computes the values of _δ1_, _δ2_, _δ3_ and _δ4_ -and determine which edge (_δ2_, _δ3_) or +and determines which edge (_δ2_, _δ3_) or blossom (_δ4_) achieves the minimum value. -The total number of dual updates during a matching may be _Θ(n2)_. -Since we want to limit the total number of steps of the matching algorithm to _O(n3)_, -each dual update may use at most _O(n)_ steps. +A naive implementation might compute _δ_ by looping over the vertices, blossoms and edges +in the graph. +The total number of delta steps during a matching may be _Θ(n2)_, +pushing the total time for _δ_ calculations to _O(n2 m)_ which is much too slow. +[[4]](#galil_micali_gabow1986) introduces a combination of data structures from which +the value of _δ_ can be computed efficiently. -We can find _δ1_ in _O(n)_ steps by simply looping over all vertices -and checking their dual variables. -We can find _δ4_ in _O(n)_ steps by simply looping over all non-trivial blossoms -(since there are fewer than _n_ non-trivial blossoms). -We could find _δ2_ and _δ3_ by simply looping over -all edges of the graph in _O(m)_ steps, but that exceeds our budget of _O(n)_ steps. -So we need better techniques. +_δ1_ is the minimum dual value of any S-vertex. +This value can be computed in constant time. +The dual value of an unmatched vertex is reduced by _δ_ during every delta step. +Since all vertex duals start with the same dual value _ustart_, +all unmatched vertices have dual value _ustart - Δ_, +which is the minimum dual value among all vertices. + +_δ3_ is half of the minimum slack of any edge between two different S-blossoms. +To compute this efficiently, we keep edges between S-blossoms in a priority queue. +The edges are inserted into the queue during scanning of newly labeled S-vertices. +To compute _δ3_, we simply find the minimum-priority element of the queue. -For _δ2_, we determine the least-slack edge between an S-blossom and unlabeled -blossom as follows. -For every vertex _y_ in any unlabeled blossom, keep track of _ey_: -the least-slack edge that connects _y_ to any S-vertex. -The thing to keep track of is the identity of the edge, not the slack value. -This information is kept up-to-date as part of the procedure that considers S-vertices. -The scanning procedure eventually considers all edges _(x, y, w)_ where _x_ is an S-vertex. -At that moment _ey_ is updated if the new edge has lower slack. +A complication may occur when a new blossom is created. +Edges that connect different top-level S-blossoms before creation of the new blossom, +may end up as internal edges inside the newly created blossom. +This implies that such edges would have to be removed from the _δ3_ priority queue, +but that would be quite difficult. +Instead, we just let those edges stay in the queue. +When computing the value of _δ3_, we thus have to check whether the minimum +element represents an edge between different top-level blossoms. +If not, we discard such stale elements until we find an edge that does. -Calculating _δ2_ then becomes a matter of looping over all vertices _x_, -checking whether _B(x)_ is unlabeled and calculating the slack of _ex_. +A complication occurs when dual variables are updated. +At that point, the slack of any edge between different S-blossoms decreases by _2\*δ_. +But we can not afford to update the priorities of all elements in the queue. +To solve this, we set the priority of each edge to its _modified slack_. -One subtle aspect of this technique is that a T-vertex can loose its label when -the containing T-blossom gets expanded. -At that point, we suddenly need to have kept track of its least-slack edge to any S-vertex. -It is therefore necessary to do the same tracking also for T-vertices. -So the technique is: For any vertex that is not an S-vertex, track its least-slack edge -to any S-vertex. +The _modified slack_ of an edge is defined as follows: -Another subtle aspect is that a T-vertex may have a zero-slack edge to an S-vertex. -Even though these edges are already tight, they must still be tracked. -If the T-vertex loses its label, this edge needs to be reconsidered by the scanning procedure. -By including these edges in least-slack edge tracking, they will be rediscovered -through a _δ2=0_ update after the vertex becomes unlabeled. +$$ \pi'_{x,y} = u'_x + u'_y - w_{x,y} $$ -For _δ3_, we determine the least-slack edge between any pair of S-blossoms -as follows. -For every S-blossom _B_, keep track of _eB_: -the least-slack edge between _B_ and any other S-blossom. -Note that this is done per S-blossoms, not per S-vertex. -This information is kept up-to-date as part of the procedure that considers S-vertices. -Calculating _δ3_ then becomes a matter of looping over all S-blossoms _B_ -and calculating the slack of _eB_. +The modified slack is computed in the same way as true slack, except it uses +the modified vertex duals instead of true vertex duals. +Blossom duals are ignored since we will never compute the modified slack of an edge that +is contained inside a blossom. -A complication occurs when S-blossoms are merged. -Some of the least-slack edges of the sub-blossoms may be internal edges in the merged blossom, -and therefore irrelevant for _δ3_. -As a result, the proper _eB_ of the merged blossom may be different from all -least-slack edges of its sub-blossoms. -An additional data structure is needed to find _eB_ of the merged blossom. +Because modified vertex duals are invariant under delta steps, so is the modified edge slack. +As a result, the priorities of edges in the priority queue remain unchanged during a delta step. -For every S-blossom _B_, maintain a list _LB_ of edges between _B_ and -other S-blossoms. -The purpose of _LB_ is to contain, for every other S-blossom, the least-slack edge -between _B_ and that blossom. -These lists are kept up-to-date as part of the procedure that considers S-vertices. -While considering vertex _x_, if edge _(x, y, w)_ has positive slack, -and _B(y)_ is an S-blossom, the edge is inserted in _LB(x)_. -This may cause _LB(x)_ to contain multiple edges to _B(y)_. -That's okay as long as it definitely contains the least-slack edge to _B(y)_. +_δ4_ is half of the minimum dual variable of any T-blossom. +To compute this efficiently, we keep non-trivial T-blossoms in a priority queue. +The blossoms are inserted into the queue when they become a T-blossom and removed from +the queue when they stop being a T-blossom. -When a new S-blossom is created, form its list _LB_ by merging the lists -of its sub-blossoms. -Ignore edges that are internal to the merged blossom. -If there are multiple edges to the same target blossom, keep only the least-slack of these edges. -Then find _eB_ of the merged blossom by simply taking the least-slack edge -out of _LB_. +A complication occurs when dual variables are updated. +At that point, the dual variable of any T-blossom decreases by _2\*δ_. +But we can not afford to update the priorities of all elements in the queue. +To solve this, we set the priority of each blossom to its _modified_ dual value +_z'B = zB + 2\*Δ_. +These values are invariant under delta steps. +_δ2_ is the minimum slack of any edge between an S-vertex and unlabeled vertex. +To compute this efficiently, we use a fairly complicated strategy that involves +three levels of priority queues. + +At the lowest level, every T-vertex or unlabeled vertex maintains a separate priority queue +of edges between itself and any S-vertex. +Edges are inserted into this queue during scanning of newly labeled S-vertices. +Note that S-vertices do not maintain this type of queue. + +The priorities of edges in these queues are set to their _modified slack_. +This ensures that the priorities remain unchanged during delta steps. +The priorities also remain unchanged when the T-vertex becomes unlabeled or the unlabeled +vertex becomes a T-vertex. + +At the middle level, every T-blossom or unlabeled top-level maintains a priority queue +containing its vertices. +This is in fact the _concatenable priority queue_ instance that is maintained by every +top-level blossom as described earlier in this document. +The priority of each vertex in the queue is set to the minimum priority of any edge +in the low-level queue of that vertex. +If edges are added to (or removed from) the low-level queue, the priority of the corresponding +vertex in the mid-level queue may change. +If the low-level queue of a vertex is empty, that vertex has priority _Inf_ in the mid-level queue. + +At the highest level, unlabeled top-level blossoms are tracked in one global priority queue. +The priority of each blossom in this queue is set to the minimum slack of any edge +from that blossom to an S-vertex plus _Δ_. +These priorities are invariant under delta steps. + +To compute _δ2_, we find the minimum priority in the high-level queue +and adjust it by _Δ_. +To find the edge associated with _δ2_, +we use the high-level queue to find the unlabeled blossom with minimum priority, +then use that blossom's mid-level queue to find the vertex with minimum priority, +then use that vertex's low-level queue to find the edge with minimum priority. + +The whole thing is a bit tricky, but it works. + +### Re-using alternating trees + +According to [[5]], labels and alternating trees should be erased at the end of each stage. +However, the algorithm can be optimized by keeping some of the labels and re-using them +in the next stage. +The optimized algorithm erases _only_ the two alternating trees that are part of +the augmenting path. +All blossoms in those two trees lose their labels and become free blossoms again. +Other alternating trees, which are not involved in the augmenting path, are preserved +into the next stage, and so are the labels on the blossoms in those trees. + +This optimization is well known and is described for example in [[7]](#mehlhorn_schafer2002). +It does not affect the worst-case asymptotic run time of the algorithm, +but it provides a significant practical speed up for many types of graphs. + +Erasing alternating trees is easy enough, but selectively stripping labels off blossoms +has a few implications. +The blossoms that lose their labels need to have their modified dual values updated. +The T-blossoms additionally need to have their _offsetB_ variables updated +to keep the vertex dual values consistent. +For S-blossoms that lose their labels, the modified vertex dual variables are updated directly. + +The various priority queues also need updating. +Former T-blossoms must be removed from the priority queue for _δ4_. +Edges incident on former S-vertices must be removed from the priority queue for _δ3_. +Finally, S-vertices that become unlabeled need to construct a proper priority queue +of incident edges to other S-vertices for _δ2_ tracking. +This involves visiting every incident edge of every vertex in each S-blossom that loses its label. ## Run time of the algorithm Every stage of the algorithm either increases the number of matched vertices by 2 or ends the matching. Therefore the number of stages is at most _n/2_. -Every stage runs in _O(n2)_ steps, therefore the complete algorithm runs in -_O(n3)_ steps. - -During each stage, edge scanning is driven by the queue _Q_. -Every vertex enters _Q_ at most once. -Each vertex that enters _Q_ has its incident edges scanned, causing every edge in the graph -to be scanned at most twice per stage. -Scanning an edge is done in constant time, unless it leads to the discovery of a blossom -or an augmenting path, which will be separately accounted for. -Therefore edge scanning needs _O(m)_ steps per stage. +Every stage runs in _O((n + m) log n)_ steps, therefore the complete algorithm runs in +_O(n (n + m) log n)_ steps. Creating a blossom reduces the number of top-level blossoms by at least 2, thus limiting the number of simultaneously existing blossoms to _O(n)_. Blossoms that are created during a stage become S-blossoms and survive until the end of that stage, therefore _O(n)_ blossoms are created during a stage. +Creating a blossom with _k_ sub-blossoms reduces the number of top-level blossoms by _k-1_, +thus limiting the total number of sub-blossoms that can be involved in blossom creation +during a stage to _O(n)_. + Creating a blossom involves tracing the alternating path to the closest common ancestor, -and some bookkeeping per sub-blossom, -and inserting new S-vertices _Q_, -all of which can be done in _O(n)_ steps per blossom creation. -The cost of managing least-slack edges between S-blossoms will be separately accounted for. -Therefore blossom creation needs _O(n2)_ steps per stage -(excluding least-slack edge management). +which takes time _O(k log n)_ for a blossom with _k_ sub-blossoms. +It also involves bookkeeping per sub-blossom, which takes time _O(log n)_ per sub-blossom. +It also involves relabeling former T-vertices as S-vertices, but I account for that +time separately below so I can ignore it here. +It also involves merging the concatenable queues which track the vertices in top-level blossoms. +Merging two queues takes time _O(log n)_, therefore merging the queues of all sub-blossoms +takes time _O(k log n)_. +Creating a blossom thus takes time _O(k log n)_. +Blossom creation thus takes total time _O(n log n)_ per stage. -As part of creating a blossom, a list _LB_ of least-slack edges must be formed. -This involves processing every element of all least-slack edge lists of its sub-blossoms, -and removing redundant edges from the merged list. -Merging and removing redundant edges can be done in one sweep via a temporary array indexed -by target blossom. -Collect the least-slack edges of the sub-blossoms into this array, -indexed by the target blossom of the edge, -keeping only the edge with lowest slack per target blossom. -Then convert the array back into a list by removing unused indices. -This takes _O(1)_ steps per candidate edge, plus _O(n)_ steps to manage the temporary array. -I choose to shift the cost of collecting the candidate edges from the sub-blossoms to -the actions that inserted those edges into the sub-blossom lists. -There are two processes which insert edges into _LB_: edge scanning and blossom -creation. -Edge scanning inserts each graph edge at most twice per stage for a total cost of _O(m)_ steps -per stage. -Blossom creation inserts at most _O(n)_ edges per blossom creation. -Therefore the total cost of S-blossom least-slack edge management is -_O(m + n2) = O(n2)_ steps per stage. +During each stage, a blossom becomes an S-blossom or T-blossom at most once. +A blossom also becomes unlabeled at most once, at the end of the stage. +Changing the label of a blossom takes some simple bookkeeping, as well as operations +on priority queues (_δ4_ for T-blossoms, _δ2_ for unlabeled +blossoms) which take time _O(log n)_ per blossom. +Assigning label S or removing label S also involves some work per vertex in the blossom, +but I account for that time separately below so I can ignore it here. +Blossom labeling thus takes total time _O(n log n)_ per stage. +During each stage, an vertex becomes an S-vertex at most once, and an S-vertex becomes +unlabeled at most once. +In both cases, the incident edges of the affected vertex are scanned and potentially +added to or removed from priority queues. +This involves finding the top-level blossoms of the endpoints of each edge, which +takes time _O(log n)_ per edge. +The updates to priority queues also take time _O(log n)_ per edge. +Edge scanning thus takes total time _O(m log n)_ per stage. + +Note that _m ≤ n2_ therefore _log m ≤ 2 log n_. +This implies that an operation on a priority queue with _m_ elements takes time _O(log n)_. + +Expanding a blossom involves some bookkeeping which takes time _O(log n)_ per sub-blossom. +It also involves splitting the concatenable queue that tracks the vertices in top-level blossoms, +which takes time _O(log n)_ per sub-blossom. +In case of a T-blossom, it also involves reconstructing the alternating path through +the blossom which takes time _O(k log n)_ for _k_ sub-blossoms. +Also in case of a T-blossom, some sub-blossoms will become S-blossoms and their +vertices become S-vertices, but I have already accounted for that cost above +so I can ignore it here. +Expanding a blossom thus takes time _O(k log n)_. The number of blossom expansions during a stage is _O(n)_. -Expanding a blossom involves some bookkeeping per sub-blossom, -and reconstructing the alternating path through the blossom, -and inserting any new S-vertices into _Q_, -all of which can be done in _O(n)_ steps per blossom. -Therefore blossom expansion needs _O(n2)_ steps per stage. +Blossom expansion thus takes total time _O(n log n)_ per stage. The length of an augmenting path is _O(n)_. -Tracing the augmenting path and augmenting the matching along the path can be done in _O(n)_ steps. -Augmenting through a blossom takes a number of steps that is proportional in the number of +Tracing the augmenting path and augmenting the matching along the path can be done +in _O(n log n)_ steps. +Augmenting through a blossom takes a number of steps that is proportional to the number of its sub-blossoms. Since there are fewer than _n_ non-trivial blossoms, the total cost of augmenting through blossoms is _O(n)_ steps. -Therefore the total cost of augmenting is _O(n)_ steps per stage. +Augmenting thus takes total time _O(n log n)_ per stage. -A dual variable update limited by _δ1_ ends the algorithm and therefore -happens at most once. -An update limited by _δ2_ labels a previously labeled blossom -and therefore happens _O(n)_ times per stage. -An update limited by _δ3_ either creates a blossom or finds an augmenting path -and therefore happens _O(n)_ times per stage. -An update limited by _δ4_ expands a blossom and therefore happens -_O(n)_ times per stage. -Therefore the number of dual variable updates is _O(n)_ per stage. -The cost of calculating the _δ_ values is _O(n)_ per update as discussed above. -Applying changes to the dual variables can be done by looping over all vertices and looping over -all top-level blossoms in _O(n)_ steps per update. -Therefore the total cost of dual variable updates is _O(n2)_ per stage. +A delta step limited by _δ1_ ends the algorithm and therefore happens at most once. +A _δ2_ step assigns a label to a previously unlabeled blossom and therefore +happens _O(n)_ times per stage. +A _δ3_ step either creates a blossom or finds an augmenting path and therefore +happens _O(n)_ times per stage. +A _δ4_ step expands a blossom and therefore happens _O(n)_ times per stage. +Therefore the number of delta steps is _O(n)_ per stage. +Calculating _δ1_ takes constant time. +Calculating _δ2_ and _δ4_ requires a constant number +of lookups in priority queues which takes time _O(log n)_ per delta step. +During calculation of _δ3_, it may be necessary to remove stage edges +from the priority queue. +Since every edge is inserted into the priority queue at most once per stage, +at most _O(m)_ edges are removed per stage, which takes total time _O(m log n)_ per stage. +Calculation of _δ_ thus takes total time _O((n + m) log n)_ per stage. + +Applying updates to dual variables is done in a lazy fashion as discussed above. +The only variable that is updated directly is _Δ_, which takes time _O(1)_ per delta step. +Updating dual variables thus takes total time _O(n)_ per stage. ## Implementation details @@ -819,17 +1007,101 @@ Every edge therefore appears in two adjacency lists. These data structures are initialized at the start of the matching algorithm and never change while the algorithm runs. +#### Priority queue + +Priority queues are used for a number of purposes: + + - a priority queue to find the least-slack edge between S-blossoms; + - a priority queue to find the minimum-dual T-blossom; + - a priority queue to find the unlabeled blossom with least-slack edge to an S-blossom; + - a separate priority queue per vertex to find the least-slack edge between that vertex + and any S-vertex. + +This type of queue is implemented as a binary heap. +It supports the following operations: + + - _insert_ a new element with specified priority in time _O(log n)_; + - find the element with _minimum_ priority in time _O(1)_; + - _delete_ a specified element in time _O(log n)_; + - _change_ the priority of a specified element in time _O(log n)_. + +#### Concatenable priority queue + +Each top-level blossom maintains a concatenable priority queue containing its vertices. +We use a specific type of concatenable queue that supports the following operations +[[4]](#galil_micali_gabow1986) [[5]](#aho_hopcroft_ullman1974): + + - _create_ a new queue containing 1 new element; + - find the element with _minimum_ priority in time _O(1)_; + - _change_ the priority of a given element; + - _merge_ two queues into one new queue in time _O(log n)_; + - _split_ a queue, thus undoing the previous _merge_ step in time _O(log n)_. + +The efficient _merge_ and _split_ operations make it possible to adapt the queue during +blossom creation and blossom expansion steps. +The priorities in the queue are used to find, for a given top-level blossom, its vertex +with least-slack edge to an S-blossom. + +The concatenable queue is implemented as a balanced tree, specifically a _2-3 tree_. +Each internal node of a 2-3 tree has either 2 or 3 children. +The leaf nodes of the tree represent the elements of the queue. +All leaf nodes are at the same distance from the root. +Each node has a pointer to its parent, and each internal node has pointers to its children. +Each internal node also stores its height (distance to its leaf nodes). + +Only leaf nodes have a priority. +However, each internal node maintains a pointer to the leaf node with minimum priority +within its subtree. +As a consequence, the root of the tree has a pointer to the element with minimum priority. +To keep this information consistent, any change in the priority of a leaf node must +be followed by updating the internal nodes along a path from the leaf node to the root. +The same must be done when the structure of the tree is adjusted. + +The left-to-right order of the leaf nodes is preserved during all operations, including _merge_ +and _split_. +When trees _A_ and _B_ are merged, the sequence of leaf nodes in the merged tree will consist of +the leaf nodes _A_ followed by the leaf nodes of _B_. +Note that the left-to-right order of the leaf nodes is unrelated to the priorities of the elements. + +To merge two trees, the root of the smaller tree is inserted as a child of an appropriate node +in the larger tree. +Subsequent adjustments are needed restore the consistency the 2-3 tree and to update +the minimum-element pointers of the internal nodes along a path from the insertion point +to the root of the merged tree. +This can be done in a number of steps proportional to the difference in height between +the two trees, which is in any case _O(log n)_. + +To split a tree, a _split node_ is identified: the left-most leaf node that must end up +in the right-side tree after splitting. +Internal nodes are deleted along the path from the _split node_ to the root of the tree. +This creates a forest of disconnected subtrees on the left side of the path, +and a similar forest of subtrees on the right side of the split path. +The left-side subtrees are reassembled into a single tree through a series of _merge_ steps. +Although _O(log n)_ merge steps may be needed, the total time required for reassembly +is also _O(log n)_ because each merge step combines trees of similar height. +A similar reassembly is done for the forest on the right side of the split path. + +The concatenable queues have an additional purpose in the matching algorithm: +finding the top-level blossom that contains a given vertex. +To do this, we assign a _name_ to each concatenable queue instance, which is simply +a pointer to the top-level blossom that maintains the queue. +An extra operation is defined: +_find_ the name of the queue instance that contains a given element in time _O(log n)_. + +Implementing the _find_ operation is easy: +Starting at the leaf node that represents the element, follow _parent_ pointers +to the root of the tree. +The root node contains the name of the queue. + #### General data `vertex_mate[x] = y` if the edge between vertex _x_ and vertex _y_ is matched.
`vertex_mate[x] = -1` if vertex _x_ is unmatched. -`vertex_top_blossom[x] =` pointer to _B(x)_, the top-level blossom that contains vertex _x_. +`vertex_dual[x]` holds the modified vertex dual _u'x_. -`vertex_dual[x]` holds the value of _ux_. - -A FIFO queue holds vertex indices of S-vertices whose edges have not yet been scanned. -Vertices are inserted in this queue as soon as their top-level blossom gets label S. +A global list holds vertex indices of S-vertices whose edges have not yet been scanned. +Vertices are inserted in this list when their top-level blossom gets label S. #### Blossoms @@ -842,56 +1114,28 @@ Both types of blossoms are represented as class instances with the following att * `B.label` is `S` or `T` or `None` * `B.tree_edge = (x, y)` if _B_ is a labeled top-level blossom, where _y_ is a vertex in _B_ and _(x, y)_ is the edge that links _B_ to its parent in the alternating tree. + * `B.tree_blossoms` points to a list of blossoms in the same alternating tree, if _B_ + is a labeled top-level blossom. + * `B.vertex_dual_offset` holds the pending change to vertex duals _offsetB_. A non-trivial blossom additionally has the following attributes: * `B.subblossoms` is an array of pointers to the sub-blossoms of _B_, starting with the sub-blossom that contains the base vertex. * `B.edges` is an array of alternating edges connecting the sub-blossoms. - * `B.dual_var` holds the value of _zB_. + * `B.dual_var` holds the modified blossom dual _z'B_. Single-vertex blossoms are kept in an array indexed by vertex index.
-Non-trivial blossoms are kept in a separate array.
+Non-trivial blossoms are kept in a separate list.
These arrays are used to iterate over blossoms and to find the trivial blossom that consists of a given vertex. -#### Least-slack edge tracking - -`vertex_best_edge[x]` is an array holding _ex_, the edge index of -the least-slack edge between vertex _x_ and any S-vertex, or -1 if there is no such edge. -This value is only meaningful if _x_ is a T-vertex or unlabeled vertex. - -`B.best_edge` is a blossom attribute holding _eB_, the edge index of the least-slack -edge between blossom _B_ and any other S-blossom, or -1 if there is no such edge. -This value is only meaningful if _B_ is a top-level S-blossom. - -For non-trivial S-blossoms _B_, attribute `B.best_edge_set` holds the list _LB_ -of potential least-slack edges to other blossoms. -This list is not maintained for single-vertex blossoms, since _LB_ of a single vertex -can be efficiently constructed from its adjacency list. - #### Memory usage The data structures described above use a constant amount of memory per vertex and per edge and per blossom. Therefore the total memory requirement is _O(m + n)_. -The memory usage of _LB_ is a little tricky. -Any given list _LB_ can have length _O(n)_, and _O(n)_ of these lists can exist -simultaneously. -Naively allocating space for _O(n)_ elements per list will drive memory usage -up to _O(n2)_. -However, at any moment, an edge can be in at most two of these lists, therefore the sum -of the lengths of these lists is limited to _O(m)_. -A possible solution is to implement the _LB_ as linked lists. - -### Performance critical routines - -Calculations that happen very frequently in the algorithm are: -determining the top-level blossom of a given vertex, and calculating the slack of a given edge. -These calculations must run in constant time per call in any case, but it makes sense to put -some extra effort into making these calculations _fast_. - ### Recursion Certain tasks in the algorithm are recursive in nature: @@ -948,61 +1192,42 @@ Proof by induction that all vertex duals are multiples of 0.5 and all blossom du - Blossom duals increase or decrease by _2\*δ_, therefore updated blossom duals are still integers. -The value of vertex dual variables and blossom dual variables never exceeds the -greatest edge weight in the graph. -This may be helpful for choosing an integer data type for the dual variables. +It is useful to know that (modified) dual variables and (modified) edge slacks +are limited to a finite range of values which depends on the maximum edge weight. +This may be helpful when choosing an integer data type for these variables. (Alternatively, choose a programming language with unlimited integer range. This is perhaps the thing I love most about Python.) -Proof that dual variables do not exceed _max-weight_: - - - Vertex dual variables start at _ux = 0.5\*max-weight_. - - While the algorithm runs, there is at least one vertex which has been unmatched - since the beginning. - This vertex has always had label S, therefore its dual always decreased by _δ_ - during a dual variable update. - Since it started at _0.5\*max-weight_ and can not become negative, - the sum of _δ_ over all dual variable updates can not exceed _0.5\*max-weight_. - - Vertex dual variables increase by at most _δ_ per update. - Therefore no vertex dual can increase by more than _0.5\*max-weight_ in total. - Therefore no vertex dual can exceed _max-weight_. - - Blossom dual variables start at _zB = 0_. - - Blossom dual variables increase by at most _2\*δ_ per update. - Therefore no blossom dual can increase by more than _max-weight_ in total. - Therefore no blossom dual can exceed _max-weight_. + - The value of _Δ_ (sum over _δ_ steps) does not exceed _maxweight / 2_. + Proof: + - Vertex dual variables start at _ux = maxweight_ / 2. + - While the algorithm runs, there is at least one vertex which has been unmatched + since the beginning. + This vertex has always had label S, therefore its dual is _maxweight/2 - Δ_. + Vertex deltas can not be negative, therefore _Δ ≤ maxweight/2_. + - Vertex duals are limited to the range 0 to _maxweight_. + - Blossom duals are limited to the range 0 to _maxweight_. + - Edge slack is limited to the range 0 to _2\*maxweight_. + - Modified vertex duals are limited to the range 0 to _1.5\*maxweight_. + - Modified blossom duals are limited to the range _-maxweight to 2\*maxweight_. + - Modified edge slack is limited to the range 0 to _3\*maxweight_. + - Dual offsets are limited to the range _-maxweight/2_ to _maxweight/2_. ### Handling floating point edge weights Floating point calculations are subject to rounding errors. -This has two consequences for the matching algorithm: +As a result, the algorithm may return a matching which has slightly lower weight than +the actual maximum weight. - - The algorithm may return a matching which has slightly lower weight than - the actual maximum weight. - - - The algorithm may not reliably recognize tight edges. - To check whether an edge is tight, its slack is compared to zero. - Rounding errors may cause the slack to appear positive even when an exact calculation - would show it to be zero. - The slack of some edges may even become slightly negative. - - I believe this does not affect the correctness of the algorithm. - An edge that should be tight but is not recognized as tight due to rounding errors, - can be pulled tight through an additional dual variable update. - As side-effect of this update, the edge will immediately be used to grow the alternating tree, - or construct a blossom or augmenting path. - This mechanism allows the algorithm to make progress, even if slack comparisons - are repeatedly thrown off by rounding errors. - Rounding errors may cause the algorithm to perform more dual variable updates - than strictly necessary. - But this will still not cause the run time of the algorithm to exceed _O(n3)_. - -It seems to me that the matching algorithm is stable for floating point weights. -And it seems to me that it returns a matching which is close to optimal, -and could have been optimal if edge weights were changed by small amounts. - -I must admit these arguments are mostly based on intuition. +I believe the matching algorithm is stable for floating point weights. +It seems to me that the algorithm will always return a matching that is close to optimal, +and could have been optimal if the edge weights were changed by very small amounts. +I must admit this is mostly based on intuition. Unfortunately I don't know how to properly analyze the floating point accuracy of this algorithm. +The most challenging cases are probably graphs where edge weights differ by many orders +of magnitude. + ### Finding a maximum weight matching out of all maximum cardinality matchings It is sometimes useful to find a maximum cardinality matching which has maximum weight @@ -1057,7 +1282,15 @@ changing all weights by the same amount doesn't change which of these matchings ([link](https://dl.acm.org/doi/abs/10.5555/320176.320229)) ([pdf](https://dl.acm.org/doi/pdf/10.5555/320176.320229)) + 7. + Kurt Mehlhorn, Guido Schäfer, "Implementation of O(nm log(n)) Weighted Matchings in General Graphs: The Power of Data Structures", _Journal of Experimental Algorithmics vol. 7_, 2002. + ([link](https://dl.acm.org/doi/10.1145/944618.944622)) + ([pdf](https://sci-hub.se/https://doi.org/10.1145/944618.944622)) + + 8. + Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman, + _The Design and Analysis of Computer Algorithms_, Addison-Wesley, 1974. --- -Written in 2023 by Joris van Rantwijk. +Written in 2023-2024 by Joris van Rantwijk. This work is licensed under [CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0/). diff --git a/python/mwmatching/algorithm.py b/python/mwmatching/algorithm.py index 3293fa2..c526500 100644 --- a/python/mwmatching/algorithm.py +++ b/python/mwmatching/algorithm.py @@ -470,6 +470,9 @@ class NonTrivialBlossom(Blossom): # The value of the dual variable changes through delta steps, # but these changes are implemented as lazy updates. # + # blossom.dual_var holds the modified blossom dual value. + # The modified blossom dual is invariant under delta steps. + # # The true dual value of a top-level S-blossom is # blossom.dual_var + ctx.delta_sum_2x # @@ -479,7 +482,6 @@ class NonTrivialBlossom(Blossom): # The true dual value of any other type of blossom is simply # blossom.dual_var # - # Note that "dual_var" is invariant under delta steps. self.dual_var: float = 0 # If this is a top-level T-blossom, @@ -575,6 +577,9 @@ class MatchingContext: # The value of the dual variable changes through delta steps, # but these changes are implemented as lazy updates. # + # vertex_dual_2x[x] holds 2 times the modified vertex dual value of + # vertex "x". The modified vertex dual is invariant under delta steps. + # # The true dual value of an S-vertex is # (vertex_dual_2x[x] - delta_sum_2x) / 2 # @@ -584,7 +589,6 @@ class MatchingContext: # The true dual value of an unlabeled vertex is # (vertex_dual_2x[x] + B(x).vertex_dual_offset) / 2 # - # Note that "vertex_dual_2x" is invariant under delta steps. self.vertex_dual_2x: list[float] self.vertex_dual_2x = num_vertex * [self.start_vertex_dual_2x] @@ -648,14 +652,14 @@ class MatchingContext: The pseudo-slack of an edge is related to its true slack, but adjusted in a way that makes it invariant under delta steps. - If the edge connects two S-vertices in different top-level blossoms, - the true slack is the pseudo-slack minus 2 times the running sum - of delta steps. + The true slack of an edge between to S-vertices in different + top-level blossoms is + edge_pseudo_slack_2x(e) / 2 - delta_sum_2x - If the edge connects an S-vertex to an unlabeled vertex, - the true slack is the pseudo-slack minus the running sum of delta - steps, plus the pending offset of the top-level blossom that contains - the unlabeled vertex. + The true slack of an edge between an S-vertex and an unlabeled + vertex "y" inside top-level blossom B(y) is + (edge_pseudo_slack_2x(e) + - delta_sum_2x + B(y).vertex_dual_offset) / 2 """ (x, y, w) = self.graph.edges[e] return self.vertex_dual_2x[x] + self.vertex_dual_2x[y] - 2 * w @@ -1485,7 +1489,7 @@ class MatchingContext: def augment_matching(self, path: AlternatingPath) -> None: """Augment the matching through the specified augmenting path. - This function takes time O(n). + This function takes time O(n * log(n)). """ # Check that the augmenting path starts and ends in