Reputation: 135
So everywhere I see weighted union find algorithm, they use this approach:
Time complexity here is O(lgN)
Now an optimzation on this is to flatten the trees, ie, whenever I am calculating the root of a particular node, set all nodes in that path to point to that root.
Time complexity of this is O(lg*N)
This I can understand, but what I don't get is why don't they start off with an array/hashset wherein nodes point to the root (instead of the immediate parent node)? That would bring the time complexity down to O(1).
Upvotes: 1
Views: 1082
Reputation: 193
You are correct that the solution you suggest will bring down the time complexity of a Find operation to O(1). However, doing so will make a Union operation slower.
Imagine that you use an array/hashtable to remember the representative (or the root, as you call it) of each node. When you perform a union operation between two nodes x
and y
, you would either need to update all the nodes with the same representative as x
to have y
's representative, or vice versa. This way, union runs in O(min{|Sx|, |Sy|})
, where Sx
is the set of nodes with the same representative as x
. These sets can be significantly larger than log n
.
The weighted union algorithm on the other hand has O(log n)
for both Find and
So it's a trade-off. If you expect to do many Find operations, but few Union operations, you should use the solution you suggest. If you expect to do many of each, you can use the weighted union algorithm to avoid excessively slow operations.
Upvotes: 0
Reputation: 254
I am going to assume that the time complexity you are asking for is the time to check whether 2 nodes belong to the same set.
The key is in how sets are joined, specifically you take the root of one set (the smaller one) and have it point to the root of the other set. Let the two sets have p
and q
as roots respectively, and |p|
will represents the size of the set p
if p
is a root, while in general it will be the number of items whos set path goes through p (which is 1 + all its children).
We can without loss of generality assume that |p| <= |q|
(otherwise we just exchange their names). We then have that |p u q| = |p|+|q| >= 2|p|
. This shows us that each subtree in data-structure can at most be half as big as its parent, so given N
items it can at most have the depth 1+lg N = O(lg(N))
.
If the two choosen items are the furthest possible from the root, it will take O(N)
operations to find the root for each of their sets, since you only need O(1)
operations to move up one layer in the set, and then O(1)
operations to compare those roots.
This cost is also applied to each union operation itself, since you need to figure out which two roots you need to merge. The reason we dont just have all nodes point directly to the root, is several-fold. First we would need to change all the nodes in the set every time we perform a union, secondly we only have edges pointing from the nodes toward the root and not the other way, so we would have to look through all nodes to find the ones we would need to change. The next reason is that we have good optimizations that can help in this kind of step and still work. Finally you could do such a step at the end if you really need to, but it would cost O(N lg(N))
time to perform it, which is compariable to how long it would take to run the entire algorithm by itself without the running short-cut optimization.
Upvotes: 0