Beginner in OpenMP - Problems in cicle

I am a beginner in OpenMP and i am trying to parallelize the following function:

void calc(double *x, int *l[N], int d[N], double *z){

    #pragma omp parallel for
    for(int i=0; i<N; i++){

        double tmp = d[i]>0 ? ((double) z[i] / d[i]) : ((double) z[i] / N);

        for(int j=0; j<d[i]; j++)
            x[l[i][j]] += tmp;

    }

}

But for an N=100000 the sequential time is about 50 seconds and with 2 or more threads it goes up to several minutes.

The L array of pointers has randomly between 1 and 30 elements (given by the corresponding position in the d array) and the elements varies between 0 and N, so i know i have a load-balance problem but if i had a guided or dynamic scheduling (even auto) the times are even worse.

I also know that the problem is obviously in the accesses to the x array because its not being contiguously acceded but is there a way to fix this problem and have some kind of speedups in this function?

Thanks in advance!

Upvotes: 1

Views: 87

Answers (1)

Jerry Coffin
Jerry Coffin

Reputation: 490018

Assuming you can afford to use some extra space to do it, you can probably speed this up.

The basic idea would be to create a separate array of sums for each thread, then when they're all done add up the corresponding elements in those separate copies, and finally add each element of that result to the corresponding element in the original x.

As long as x is fairly small that's probably pretty reasonable. If x might be really huge, it may get less practical in a hurry. Given that L is apparently only about 30 elements, it sounds like x is probably limited to around 30 elements (that can actually be used while running this code, anyway) as well. If that's correct, then having a separate copy for each thread shouldn't cause a major problem.

Upvotes: 1

Related Questions