Megidd
Megidd

Reputation: 7961

How to store data elements in groups/clusters by union-find

I have implemented union-find algorithm inspired by this code with this API:

void unify(int p, int q)
int find(int p)
boolean connected(int p, int q)
int clusterSize(int p)

Then I detect clusters with a nested loop like this:

for i = (0) to (data size)
    for j = (i+1) to (data size)
        if ( "i" and "j" meet this condition )
            unify(i,j)
    end for
end for

To actually store each cluster/group in a separate array, currently I loop over all my data elements and call find like this:

for i = 0 to data-size
    k = find(i);
    // ... Store data element "i" in cluster "k" (array "k")
end-for

I have a feeling like, the last for loop to store data elements in different groups/clusters might not be necessary. I wonder if I'm on right track?

Upvotes: 2

Views: 799

Answers (1)

Arnab Roy
Arnab Roy

Reputation: 619

First you need to initialize by making each element a separate cluster. While doing so there's no need to call find method as initially each of them itself is a cluster.

int clusters[data_size];
for(int i = 0; i < data_size; i++)
{
  clusters[i] = i; // each element refers to itself initially, indicating that it's a separate cluster
}

Now upon the nested loop check which elements are needed to be clustered, and then unify those -

for i = (0) to (data size)
    for j = (i+1) to (data size)
        if ( "i" and "j" meet this condition )
            unify(i,j)
    end for
end for

In the unify(p, q) method, what you can do is -

clusters[find(p)] = find(q); // indicates p belongs to cluster number q

Or the vice versa is also true -

clusters[find(q)] = find(p); // indicates q belongs to cluster number p

There's a ranking technique in the Union Find algorithm, to choose which of the above to use. That ensures - we will always merge the smaller cluster into the bigger cluster. But I assume you are not worried about that right now.

Now, the next question would be - why are we calling find in the above expression. It's because, we are actually finding out the root element of a certain cluster(Or simply find the element which actually represent the cluster).

Upvotes: 1

Related Questions