Reputation: 131
I'm trying to optimize a union-find algorithm for finding connected components in images. My image can be a 2d or 3d file, consisting of 0s and 1s. I found an implementation in this thread: Connected Component Labelling, with the answer by user Dukering.
I adapted that code to my purpose. The code works, but the execution time rapidly becomes too big. I don't understand the problem.
My code is shown below. The file I was testing it with is linked here: https://utexas.box.com/s/k12m17rg24fw1yh1p21hytxwq5q8959u That is a 2223x2223 size file (defined in the program below).
As the original user mentioned, this is a basic implementation of union-find, and one can make it more efficient. I don't understand how. In addition, I tested this image in Matlab, and Matlab is much faster. For example, the image linked above takes ~1.5 minutes on my computer, but Matlab does it in like a second using bwlabel. I checked the algorithms bwlabel uses, and it seemed to be some variation of union-find, which is why I started this effort in the first place. How do I make my code work as fast as that? I should also mention that I hope to run my code on much larger images (as large as 1000^3). There is no way my current version can do that.
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#define w 2223
#define h 2223
void writeArrayInt(int *data, int dims[], char *filename)
{
FILE *fp;
fp = fopen(filename,"w");
/* write grid dimensions */
fwrite(dims, sizeof(int), 3, fp);
/* write data array */
fwrite(data, sizeof(int), w*h, fp);
fclose(fp);
}
void readArrayInt(int *data, int dims[], char *filename)
{
FILE *fp;
fp = fopen(filename,"r");
/* read grid dimensions */
fread(dims, sizeof(int), 3, fp);
/* read data array */
fread(data, sizeof(int), w*h, fp);
fclose(fp);
}
void doUnion(int a, int b, int *component)
{
// get the root component of a and b, and set the one's parent to the other
while (component[a] != a)
a = component[a];
while (component[b] != b)
b = component[b];
component[b] = a;
}
void unionCoords(int x, int y, int x2, int y2, int *component, int *input)
{
int ind1 = x*h + y;
int ind2 = x2*h + y2;
if (y2 < h && x2 < w && input[ind1] && input[ind2] && y2 >= 0 && x2 >= 0)
doUnion(ind1, ind2, component);
}
int main()
{
int i, j;
int *input = (int *)malloc((w*h)*sizeof(int));
int *output = (int *)malloc((w*h)*sizeof(int));
int dims[3];
char fname[256];
sprintf(fname, "phi_w_bin");
readArrayInt(input, dims, fname);
int *component = (int *)malloc((w*h)*sizeof(int));
for (i = 0; i < w*h; i++)
component[i] = i;
for (int x = 0; x < w; x++)
for (int y = 0; y < h; y++)
{
unionCoords(x, y, x+1, y, component, input);
unionCoords(x, y, x, y+1, component, input);
unionCoords(x, y, x-1, y, component, input);
unionCoords(x, y, x, y-1, component, input);
unionCoords(x, y, x+1, y+1, component, input);
unionCoords(x, y, x-1, y+1, component, input);
unionCoords(x, y, x+1, y-1, component, input);
unionCoords(x, y, x-1, y-1, component, input);
}
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
int c = x*h + y;
if (input[c] == 0)
{
output[c] = input[c];
continue;
}
while (component[c] != c) c = component[c];
int c1 = x*h + y;
output[c1] = component[c];
}
}
sprintf(fname, "outputImage2d");
writeArrayInt(output, dims, fname);
free(input);
free(output);
free(component);
}
Upvotes: 1
Views: 1404
Reputation: 577
One of the good working algorithm for disjoint-sets using union by rank and path compression is follows:
Implementation, using struct Node component[]
. Which contains, the array of all the elements.
#include <stdio.h>
#include <stdlib.h>
struct Node
{
// Needed for union and find.
int parent;
int rank;
};
// Find implementation using path compression, NOTE: a is index of the element to be found.
int find (struct Node *component, int a)
{
if (component[a].parent != a)
return component[a].parent = find(component[a], component[a].parent)
return a;
}
// Union implementation using rank. NOTE: a and b are index of the element
void union(struct Node *component, int a, int b)
{
if (find(component, a) != find(component, b))
{
if (component[a].rank == component[b].rank)
component[a].rank += 1;
if (component[a].rank >= component[b].rank)
component[b].parent = a;
else
component[a].parent = b;
}
}
You can use the above functions, to do Union-Find in constant time (amortised). And it should be clear that you might have to modify the structure, as it fits your data.
You can also implement it in C++ by using templates. But as the question is tagged with C, hence I have provided this solution.
If you wanna read about the above algorithm, this link might help.Union-Find Algorithm
Please comment for any further clarification.
Upvotes: -1
Reputation: 5421
I would recomment two improvements to your union-find structure:
while (component[c] != c)
kind of lines. For reference, check out the informative Wikipedia entry on union-find data structuresfind(x)
returns in component[x]
, thus reducing the time needed in a second call of find(x)
) and union-by-rank or union-by-size (make the larger set the parent of the smaller set)EDIT: Since there seemed to be some clarification needed concerning another answer, I'll add a minimal implementation myself:
typedef struct {
int* parent;
int size;
} union_find;
union_find make_sets(int size) {
union_find result;
result.parent = malloc(sizeof(int) * size);
result.size = size;
for (int i = 0; i < size; ++i) {
result.parent[i] = size;
}
return result;
}
int find(union_find uf, int i) {
if (uf.parent[i] < uf.size)
return uf.parent[i] = find(uf, uf.parent[i]);
return i;
}
void do_union(union_find uf, int i, int j) {
int pi = find(uf, i);
int pj = find(uf, j);
if (pi == pj) {
return;
}
if (pi < pj) {
// link the smaller group to the larger one
uf.parent[pi] = pj;
} else if (pi > pj) {
// link the smaller group to the larger one
uf.parent[pj] = pi;
} else {
// equal rank: link arbitrarily and increase rank
uf.parent[pj] = pi;
++uf.parent[pi];
}
}
Upvotes: 2
Reputation: 15793
Union-find should work in constant time if correctly implemented.
Here are a few ideas:
-- modify find
such that each time when you go up the tree until you reach the root (the root being the node with the property NODE.up = NODE
), you update all the UP
nodes for all the nodes that you followed going up. In other words, when you look for the connected component of 2 nodes, you update that component (represented as the index of its root node) for all the nodes that were followed on that path.
-- the second time when you find the component of a node it will be constant time not only for itself but also for the intermediate nodes.
-- union should all the time take likear time array[node] = parent_node
.
Upvotes: 1