Reputation: 275
I am trying to parallelize a code using OpenMP, the serial time for my current input size is around 9 seconds, I have a code of the following form:
int main()
{
/* do some stuff*/
myfunction();
}
void myfunction()
{
for (int i=0; i<n; i++)
{
//it has some parameters but that is beyond the point I guess
int rand = custom_random_generator();
compute(rand);
}
}
so here the random generator can be executed in parallel since there are no dependencies, and the same goes for the compute function so I was attempting to parallel this piece but all my attempts resulted in a failure, the first thought was to put these functions as task so they get executed in parallel but resulted in a slower result, here is what I did
void myfunction()
{
for (int i=0; i<n; i++)
{
#pragma omp task
{
//it has some parameters but that is beyond the point I guess
int rand=custom_random_generator();
compute(rand);
}
}
}
Result: 23 seconds, more than double the serial time
Putting task on compute()
only resulted in the same
Even worse attempt:
void myfunction()
{
#pragma omp parallel for
for (int i=0; i<n; i++)
{
//it has some parameters but that is beyond the point I guess
int rand=custom_random_generator();
compute(rand);
}
}
Result: 45 seconds
Theoretically speaking, why could this happen? I know that for anyone to tell my exact problem they would need a minimum reproducible example but my goal from the question is to understand the different theories that could explain my problem and apply them myself, why would parallelizing an "embarrassingly parallel" piece of code result in way worse performance?
Upvotes: 2
Views: 98
Reputation: 23832
One theory could be the overhead that is associated with creating and maintaining multiple threads.
The advantges of parallel programming can only be seen when each iteration has to perform more complicated processor intensive tasks.
A simple for loop with some simple routine inside would not take advantage of it.
Upvotes: 1