Reputation: 7961
I have implemented union-find algorithm inspired by this code with this API:
void unify(int p, int q)
int find(int p)
boolean connected(int p, int q)
int clusterSize(int p)
Then I detect clusters with a nested loop like this:
for i = (0) to (data size)
for j = (i+1) to (data size)
if ( "i" and "j" meet this condition )
unify(i,j)
end for
end for
To actually store each cluster/group in a separate array
, currently I loop over all my data elements and call find
like this:
for i = 0 to data-size
k = find(i);
// ... Store data element "i" in cluster "k" (array "k")
end-for
I have a feeling like, the last for loop
to store data elements in different groups/clusters might not be necessary. I wonder if I'm on right track?
Upvotes: 2
Views: 799
Reputation: 619
First you need to initialize by making each element a separate cluster. While doing so there's no need to call find
method as initially each of them itself is a cluster.
int clusters[data_size];
for(int i = 0; i < data_size; i++)
{
clusters[i] = i; // each element refers to itself initially, indicating that it's a separate cluster
}
Now upon the nested loop check which elements are needed to be clustered, and then unify those -
for i = (0) to (data size)
for j = (i+1) to (data size)
if ( "i" and "j" meet this condition )
unify(i,j)
end for
end for
In the unify(p, q)
method, what you can do is -
clusters[find(p)] = find(q); // indicates p belongs to cluster number q
Or the vice versa is also true -
clusters[find(q)] = find(p); // indicates q belongs to cluster number p
There's a ranking technique in the Union Find
algorithm, to choose which of the above to use. That ensures - we will always merge the smaller cluster into the bigger cluster. But I assume you are not worried about that right now.
Now, the next question would be - why are we calling find
in the above expression. It's because, we are actually finding out the root element of a certain cluster(Or simply find the element which actually represent the cluster).
Upvotes: 1