Reputation: 1321
I'm trying to write a k-means clustering class. I want to make my function parallel.
void kMeans::findNearestCluster()
{
short closest;
int moves = 0;
#pragma omp parallel for reduction(+:moves)
for(int i = 0; i < n; i++)
{
float min_dist=FLT_MAX;
for(int k=0; k < clusters; k++)
{
float dist_sum = 0.0, dist2 = 0.0;
for(int j = 0; j < dim; j++)
{
dist_sum = centroids[k*dim+j] - data[i*dim+j];
dist2 += dist_sum * dist_sum;
}
if (dist2 < min_dist)
{
min_dist = dist2;
closest = k;
}
}
if (assignment[i] != closest)
{
assignment[i] = closest;
moves++;
}
}
this->moves = moves;
}
This is how it should work:
Step 1: Find nearest cluster
Loop through all the datapoints, and compare the distances between all centroids.
When the closest centroid is found, it is stored in the variable called closest
.
Check if this point is assigned to the newly found closest cluster. If not, move it the the new one. (increment moves)
Step 2: Recalculate the centroids, based on the new assignments. (function is not shown)
Repeat Step 1 and Step 2 until no more movement happens.
Without #parallel
moves
converges to zero. If I have #parallel
moves have random values. I think because the parallel loops have conflict to update move
.
Maybe a better strategy would be to have each thread its own move variable, and at the end they would some up.
Upvotes: 0
Views: 481
Reputation: 797
You use the closest
variable inside parallel loop, both writing to it and using it as a check before increasing your moves
variable. But it is declared outside the loop, so all iterations use the same variable! As all iterations are executed (theoretically) in parallel, you cannot expect that any iteration sees what any other iteration wrote to closest
in branch condition if (assignment[i] != closest)
. This variable becomes randomly updated by racing parallel threads. Therefore your moves
evaluates to junk value.
Moving a declaration of closest
inside the outer loop or declaring it as private(closest)
in OpenMP pragma may solve your problem.
By the way, closest
may be uninitialized, and should be better of the same type as k
, i.e. int
.
Upvotes: 1