parallel for with omp stucks

Question

I have problem with the following code:

int *chosen_pts = new int[k];
std::pair *dist2 = new std::pair[x.n];
// initialize dist2
for (int i = 0; i < x.n; ++i) {
    dist2[i].first = std::numeric_limits::max();
    dist2[i].second = i;
}

// choose the first point randomly
int ndx = 1;
chosen_pts[ndx - 1] = rand() % x.n;
double begin, end;
double elapsed_secs;
while (ndx < k) {
    float sum_distribution = 0.0;
    // look for the point that is furthest from any center
    begin = omp_get_wtime();
    #pragma omp parallel for reduction(+:sum_distribution)
    for (int i = 0; i < x.n; ++i) {

        int example = dist2[i].second;
        float d2 = 0.0, diff;
        for (int j = 0; j < x.d; ++j) {
            diff = x(example,j) - x(chosen_pts[ndx - 1],j);
            d2 += diff * diff;
        }
        if (d2 < dist2[i].first) {
            dist2[i].first = d2;
        }

        sum_distribution += dist2[i].first;

    }

    end = omp_get_wtime() - begin;

    std::cout << "center assigning -- " 
            << ndx << " of " << k << " = " 
            << (float)ndx / k * 100 
            << "% is done. Elasped time: "<< (float)end <<"
";        

    /**/
    bool unique = true;

    do {
        // choose a random interval according to the new distribution
        float r = sum_distribution * (float)rand() / (float)RAND_MAX;
        float sum_cdf = dist2[0].first;
        int cdf_ndx = 0;
        while (sum_cdf < r) {
            sum_cdf += dist2[++cdf_ndx].first;
        }
        chosen_pts[ndx] = cdf_ndx;

        for (int i = 0; i < ndx; ++i) {
            unique = unique && (chosen_pts[ndx] != chosen_pts[i]);
        }
    } while (! unique);


    ++ndx;
}

As you can see i use omp to make parallel the for loop. It works fine and i can achive a significant speed up. However if i increase the value of x.n over 20000000 the function stops to work after 8-10 loops:

It doestn produces any output (std::cout)
Only one core works
No error, whatsoever

If i comment out the do while loop, it works again as expected. All cores are busy and there is an output after each iteration, and i can increase k.n over 100 millions just as i need it.

Alexey Kukanov · Accepted Answer

It's not OpenMP parallel for getting stuck, it's obviously in your serial do-while loop.

One particular issue that I see is that there is no array boundary checks in the inner while loop accessing dist2. In theory, out-of-boundary access should never happen; but in practice it may - see below why. So first of all I would rewrite the calculation of cdf_ndx to guarantee that the loop ends when all elements are inspected:

    float sum_cdf = 0;
    int cdf_ndx = 0;
    while (sum_cdf < r && cdf_ndx < x.n ) {
        sum_cdf += dist2[cdf_ndx].first;
        ++cdf_ndx;
    }

Now, how it may happen that sum_cdf does not reach r? It is due to specifics of floating-point arithmetic and the fact that sum_distribution was computed in parallel, while sum_cdf is computed serially. The problem is that contribution of one element to the sum can be below the accuracy for floats; in other words, when you sum two float values that differ more than ~8 orders of magnitude, the smaller one does not affect the sum.

So, with 20M of floats after some point it might happen that the next value to add is so small comparing to the accumulated sum_cdf that adding this value does not change it! On the other hand, sum_distribution was essentially computed as several independent partial sums (one per thread) then combined together. Thus it is more accurate, and possibly bigger than sum_cdf can ever reach.

A solution can be to compute sum_cdf in portions, having two nested loops. For example:

    float sum_cdf = 0;
    int cdf_ndx = 0;
    while (sum_cdf < r && cdf_ndx < x.n ) {
        float block_sum = 0;
        int block_end = min(cdf_ndx+10000, x.n); // 10000 is arbitrary selected block size
        for (int i=cdf_ndx; i=r ) {
                block_end = i; // adjust to correctly compute cdf_ndx
                break;
            }
        }
        sum_cdf += block_sum;
        cdf_ndx = block_end;
    }

And after the loop you need to check that cdf_ndx < x.n, otherwise repeat with a new random interval.

parallel for with omp stucks

Answers (1)

Related Questions