Reputation: 204
I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:
static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;
for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
}
I can not use, up to this point, #pragma parallel for. I can only use:
#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();
My implementation looks like this :
#define NUM_STEPS 800
int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;
double sum[num_steps];
for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}
omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}
double totalSum = 0;
for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}
pi = step * totalSum;
printf("Pi: %.5f", pi);
}
Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?
Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.
Upvotes: 2
Views: 3891
Reputation:
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
Upvotes: 4