Maksim Dmitriev
Maksim Dmitriev

Reputation: 6209

Understanding union find

I read the wiki page and don't understand why it matters to append a smaller list to a larger one.

Here is that part of the algorithm from the wiki page:

Suppose you have a collection of lists and each node of each list contains an object, the name of the list to which it belongs, and the number of elements in that list. Also assume that the total number of elements in all lists is n (i.e. there are n elements overall). We wish to be able to merge any two of these lists, and update all of their nodes so that they still contain the name of the list to which they belong. The rule for merging the lists A and B is that if A is larger than B then merge the elements of B into A and update the elements that used to belong to B, and vice versa.

Upvotes: 0

Views: 1722

Answers (3)

Daniel
Daniel

Reputation: 7734

Union-Find is just a way for you to keep who is the "leader" of some set.

For example, suppose you have 5 people A, B, C, D, E.

Initially each one starts on its own set, so you can code it like this:

for each person in people:
    parent[person] = person

this way you can set every person to be the leader of its own set.

this is what it should look like:

{A}   {B}   {C}   {D}   {E}

the first operation we need is to be able to find the leader of a set, and this operation is called find.

then we fall into a property: a leader is someone who is its own parent.

there are two possibilities, or the person is its own parent or it's not.

if it is, then it is the leader of the set.

if it is not, then we will ask the same thing (if it is its own parent) to its parent, and so it goes.

you can code it like this:

find_leader_of(person P){
    if(parent[P] == P) return P
    else return find_leader_of(parent[P])
}

then we have the union operation, that's nothing more than turn 2 disjoints sets in one set only.

suppose you have this situation:

{A, B}        {C, D}         {E}

and you do union(B, D), then what happens?

first you need to find the leader of both sets:

fst_leader = find_leader_of(B)
snd_leader = find_leader_of(D)

then you choose any of these leaders to be the leader of the other:

parent[snd_leader] = fst_leader

and this results in:

union(person P1, person P2){
    fst_leader = find_leader_of(P!)
    snd_leader = find_leader_of(P2)
    parent[snd_leader] = fst_leader
}

there are other approaches to improve union-find and other ways to choose who will be leader of who, but this is the basics you must understand in order to know what union-find is about.

Upvotes: 2

EBo
EBo

Reputation: 377

Unless you want to use the naive approach, you might want to look at the Hoshen-Kopelman Algorithm https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html -- which has an amortized complexity of N*A (where A is the inverse Ackerman function -- which asymptotically approaches the constant 6). In other words, it is WICKED FAST.

If I am not mistaken, a number of people have looked at the overall efficiency when collapsing the trees -- ie when it is most efficient to do so. I cannot point you to those studies, but I do remember reading them in a text book a decade ago.

Hope this helps.

Upvotes: 0

Peter de Rivaz
Peter de Rivaz

Reputation: 33519

This is describing a naive approach for performing the update operation in which you will iterate over all the elements in one of the lists and change the label.

It will be faster to iterate over a shorter list so it makes sense to merge the smaller list into the larger one.

Note that the wiki page goes on to describe much more efficient methods for a disjoint set data structure in which it no longer matters which list is longer.

Upvotes: 0

Related Questions