Reputation: 643
I am learning union-find and to understand it better, I have written a small program:
#include <iostream>
#include <vector>
#include <numeric>
using namespace std;
vector<int> parent, sz;
int find(int i) {
if(parent[i]==i) return i;
return parent[i]=find(parent[i]);
}
void merge(int i, int j) {
int p1=find(i);
int p2=find(j);
if(p1==p2) return;
if(sz[p1]<sz[p2]) {
parent[p1]=p2;
sz[p2]+=sz[p1];
} else {
parent[p2]=p1;
sz[p1]+=sz[p2];
}
}
int main() {
vector<vector<int>> allowedSwaps={{0,4},{4,2},{1,3},{1,4}};
int n=5; //hard-code for now
sz.resize(n,1);
parent.resize(n);
iota(begin(parent),end(parent),0);
cout<<"Parents before: \n";
for(auto& e: parent)
cout<<e<<" ";
cout<<"\n";
for(vector<int>& currswap: allowedSwaps) {
merge(currswap[0],currswap[1]);
}
cout<<"Parents after: \n";
for(auto& e: parent)
cout<<e<<" ";
cout<<"\n";
return 0;
}
allowedSwaps
essentially denotes all the components that are connected to each other. For the code above, all of them are connected.
However, if you notice the output:
Parents before:
0 1 2 3 4
Parents after:
0 0 0 1 0
It shows that one of them (3
, 0-indexed) is not connected. I think that is because when we process the edge {1,3}
, both the vertices 1
and 3
have not been connected to the larger component yet (they are later, by {1,4}
) and so Union Find determines that it is a different component.
Does this mean that the order of edges matter for Union find? Or, am I missing something?
Upvotes: 3
Views: 274
Reputation: 852
You are dealing issue with path compression. After merging all the connected components, just call your find()
function for every element. Please consider adding the following lines at the end of your program:
for(int i=0; i<5; i+=1) find(i);
cout<<"Parents after path compression: \n";
for(auto& e: parent)
cout<<e<<" ";
cout<<"\n";
And you will get the expected result like this:
Parents before:
0 1 2 3 4
Parents after:
0 0 0 1 0
Parents after path compression:
0 0 0 0 0
Upvotes: 0
Reputation: 372892
I was with you in your question up until this point:
Parents after: 0 0 0 1 0
It shows that one of them (3, 0-indexed) is not connected.
Actually, item 3 is indeed linked to all the other items. Think about what would happen if you called find(3)
here. Since 3's parent is 1, and 1's parent is 0, we'd the call to find(3)
would return 0, the same cluster that contains everything else.
I think what might be tripping you up here is that two elements can be in the same cluster even if their immediate parents aren't the same. That's normal, and it's why the find
function traces through the parents until it finds an item that's its own parent.
Upvotes: 1
Reputation: 350375
The output you got does not signify that one node is disconnected.
The parent
data structure represents links from one node to another (or itself, when it is a root).
At the start you have this:
And at the end we have this:
The important thing here is that there is one tree. It is not required that all nodes are linked directly to the root, just that they have a path to the root, which is also true for the node with index 3.
NB: if you would call find(3)
, then also index 3 would receive value 0. It is just something that gets optimised by calling find
, but it doesn't change the meaning of result.
Upvotes: 7