OpenMP loop runs code slower than serial loop

Question

I'm running this neat little gravity simulation and in serial execution it takes a little more than 4 minutes, when i parallelize one loop inside a it increases to about 7 minutes and if i try parallelizing more loops it increases to more than 20 minutes. I'm posting a slightly shortened version without some initializations but I think they don't matter. I'm posting the 7 minute version however with some comments where i wanted to add parallelization to loops. Thank you for helping me with my messy code.

#include 
#include 
#include 
#include 
#include 

#define numb 1000
int main(){
  double pos[numb][3],a[numb][3],a_local[3],v[numb][3];
  memset(v, 0.0, numb*3*sizeof(double));
  double richtung[3];
  double t,deltat=0.0,r12 = 0.0,endt=10.;
  unsigned seed;
  int tcount=0;
  #pragma omp parallel private(seed) shared(pos)
  {
    seed = 25235 + 16*omp_get_thread_num();
    #pragma omp for 
    for(int i=0;i



I'm using 
    g++ -fopenmp -o test_grav test_grav.c 
to compile the code and I'm measuring time in the shell just by 
    time ./test_grav. 
When I used 
    get_numb_threads() 
to get the number of threads it displayed 4. top also shows more than 300% (sometimes ~380%) cpu usage. Interesting little fact if I start the parallel region before the time-loop (meaning the most outer for-loop) and without any actual #pragma omp for it is equivalent to making one parallel region for every major (the three second to most outer loops) loop. So I think it is an optimization thing, but I don't know how to solve it. Can anyone help me?

Edit: I made the example verifiable and lowered numbers like numb to make it better testable but the problem still occurs. Even when I remove the critical region as suggested by TheQuantumPhysicist, just not as severely.

OpenMP loop runs code slower than serial loop

Answers (1)

Related Questions