George Moutsios
George Moutsios

Reputation: 31

I am having trouble with OpenMP on C

I want to parallelize the for loops and I can't seem to grasp the concept, every time I try to parallelize them it still works but it slows down dramatically.

for(i=0; i<nbodies; ++i){
    for(j=i+1; j<nbodies; ++j) {
        d2 = 0.0;   
        
        for(k=0; k<3; ++k) {
            
            rij[k] = pos[i][k] - pos[j][k];
            
            d2 += rij[k]*rij[k];
        
        if (d2 <= cut2) {
           d = sqrt(d2);
           d3 = d*d2;
           
           for(k=0; k<3; ++k) {
                double f = -rij[k]/d3;
                forces[i][k] += f;
                forces[j][k] -= f;
           }
           
           ene += -1.0/d; 
        }
       }
    }
}

I tried using synchronization with barrier and critical in some cases but nothing happens or the processing simply does not end.

Update, this is the state I'm at right now. Working without crashes but calculation times worsen the more threads I add. (Ryzen 5 2600 6/12)

#pragma omp parallel shared(d,d2,d3,nbodies,rij,pos,cut2,forces) private(i,j,k) num_threads(n)
    {
        clock_t begin = clock();
       #pragma omp for schedule(auto)
        for(i=0; i<nbodies; ++i){
            
            for(j=i+1; j<nbodies; ++j) {
                d2 = 0.0;
                for(k=0; k<3; ++k) {
                    rij[k] = pos[i][k] - pos[j][k];
                    d2 += rij[k]*rij[k];    
                }
                
                if (d2 <= cut2) {
                    d = sqrt(d2);
                    d3 = d*d2;
                #pragma omp parallel for shared(d3) private(k) schedule(auto) num_threads(n)
                 for(k=0; k<3; ++k) {
                    double f = -rij[k]/d3;
                    #pragma omp atomic 
                    forces[i][k] += f;
                    #pragma omp atomic
                    forces[j][k] -= f;
                    }
                    
                    ene += -1.0/d; 
                }
            }
        }
    
        clock_t end = clock();
        double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
        #pragma omp single
        printf("Calculation time %lf sec\n",time_spent);
    }

I incorporated the timer in the actual parallel code (I think it is some milliseconds faster this way). Also I think I got most of the shared and private variables right. In the file it outputs the forces.

Upvotes: 0

Views: 103

Answers (3)

George Moutsios
George Moutsios

Reputation: 31

Solved, turns out all I needed was

#pragma omp parallel for nowait

Doesn't need the "atomic" either.

Weird solution, I don't fully understand how it works but it does also the output file has 0 corrupt results whatsoever.

Upvotes: 0

Laci
Laci

Reputation: 2818

You should always define your variables in their minimal required scope, especially if performance is an issue. (Note that if you do so your compiler can create more efficient code). Besides performance it also helps to avoid data race.

I think you have misplaced a curly brace and the condition in the first for loop should be i<nbodies-1. Variable ene can be summed up using reduction and to avoid data race atomic operations have to be used to increase array forces, so you do not need to use slow barriers or critical sections. Your code should look something like this (assuming int for indices and double for calculation):

#pragma omp parallel for reduction(+:ene)
 for(int i=0; i<nbodies-1; ++i){
    for(int j=i+1; j<nbodies; ++j) {
        double d2 = 0.0;       
        double rij[3];
        for(int k=0; k<3; ++k) {
            rij[k] = pos[i][k] - pos[j][k];            
            d2 += rij[k]*rij[k];       
        }
        if (d2 <= cut2) {
           double d = sqrt(d2);
           double d3 = d*d2;
           
           for(int k=0; k<3; ++k) {
                double f = -rij[k]/d3;
                #pragma omp atomic
                forces[i][k] += f;
                #pragma omp atomic
                forces[j][k] -= f;
           }           
           ene += -1.0/d;         
       }
    }
 }
}

Upvotes: 0

Victor Eijkhout
Victor Eijkhout

Reputation: 5794

Using barriers or other synchronizations will slow down your code, if the amount of unsynchronized work is not larger by a good factor. That is not the case with you. You probably need to reformulate your code to remove synchronization.

You are doing something like an N-body simulation. I've worked out a couple of solutions here: https://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-examples.html#N-bodyproblems

Also: your d2 loop is a reduction, so you can treat it like that, but it is probably enough if that variable is private to the i,j iterations.

Upvotes: 1

Related Questions