bestrong
bestrong

Reputation: 151

Results of parallel program with nested loops differ from serial program

I would like to use OpenMP for this single thread code:

PROGRAM SINGLE
  INTEGER, DIMENSION(30000)::SUMGRM
  INTEGER, DIMENSION(90000)::GRI,H
  REAL*8::HSTEP1X,HSTEP2X
  REAL*8::TIME1,TIME2

!Just intiial value
  DO I=1, 30000
     SUMGRM(I)=I*3        
  END DO

  DO I=1, 90000
     GRI(I)=I
     H(I)=0.5*I/10000    
  END DO

!Computing computer's running time (start) : for serial programming
 CALL CPU_TIME(TIME1)

 DO K=1, 50000
    DO I=2, 30000
       HSTEP1X=0.0    
         DO J=SUMGRM(I-1)+1, SUMGRM(I)-1
            HSTEP2X=H(GRI(J))/0.99
            HSTEP1X=HSTEP1X+HSTEP2X       
         END DO
       HSTEP2X=H(GRI(SUMGRM(I)))/0.99
       HSTEP1X=HSTEP1X+HSTEP2X         
    END DO
 END DO

  PRINT *, 'Results  =', HSTEP1X
  PRINT *, '   '

!Computing computer's running time (finish) : for serial programming
 CALL CPU_TIME(TIME2)
 PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM SINGLE

As you can see, the main problem is located at the most inner side looping (J) which is also a function of most outer side looping (I). I've tried to parallelize this program like this:

PROGRAM PARALLEL
  INTEGER, DIMENSION(30000)::SUMGRM
  INTEGER, DIMENSION(90000)::GRI,H
  REAL*8::HSTEP1X,HSTEP2X
  REAL*8::TIME1,TIME2,OMP_GET_WTIME
  INTEGER::Q2,P2

!Just intiial value
  DO I=1, 30000
     SUMGRM(I)=I*3        
  END DO

  DO I=1, 90000
     GRI(I)=I
     H(I)=0.5*I/10000  
  END DO

!Computing computer's running time (start) : for parallel programming
 TIME1= OMP_GET_WTIME()

 DO K=1, 50000
 !$OMP PARALLEL DO PRIVATE (HSTEP1X,Q2,P2)
    DO I=2, 30000
       HSTEP1X=0.0
       Q2=SUMGRM(I-1)+1
       P2=SUMGRM(I)-1
         DO J=Q2, P2
            HSTEP2X=H(GRI(J))/0.99
            HSTEP1X=HSTEP1X+HSTEP2X       
         END DO
       HSTEP2X=H(GRI(SUMGRM(I)))/0.99
       HSTEP1X=HSTEP1X+HSTEP2X     
    END DO
 !$OMP END PARALLEL DO
 END DO

 PRINT *, 'Results  =', HSTEP1X
 PRINT *, '   '

!Computing computer's running time (finish) : for parallel programming
 TIME2= OMP_GET_WTIME()
 PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM PARALLEL

I'm using gfortran with -O3 -fopenmp and then export OMP_NUM_THREADS=... The parallel program runs faster but the result is different with the single thread code. By the serial program I got 12.1212 (which it is the correct one) and by parallel I got 0.000 (there must be something wrong).

What did I do wrong?

Upvotes: 0

Views: 404

Answers (2)

Holmz
Holmz

Reputation: 51

Have you tried using

!$OMP PARALLEL DO DEFAULT(PRIVATE) REDUCTION(+:HSTEP1X)                                                                                                                                     

Upvotes: 0

d_1999
d_1999

Reputation: 854

Firstly we can note that by default you're likely to find that both j and hstep2x are going to be shared between threads. I don't think this is really what you want as it will lead to some very odd behaviour were multiple threads are using the same iteration index but are trying to loop over different ranges.

Next let's note that your serial code actually just prints the result for the i=30000 iteration as the value of hstep1x is reset to 0 at the start of each iteration. As such to get the "correct" answer in the openmp code we could just focus on reproducing the final iteration -- this completely negates the point of using openmp here I think. I'm guessing this is just a simple case you're trying to use to represent your real problem -- I think you may have missed some of the real problem in producing this.

Nevertheless the below code produces the "correct" answer on my machine. I'm not sure how flexible it is but it works here.

PROGRAM PARALLEL
  INTEGER, DIMENSION(30000)::SUMGRM
  INTEGER, DIMENSION(90000)::GRI,H
  REAL*8::HSTEP1X,HSTEP2X
  REAL*8::TIME1,TIME2,OMP_GET_WTIME
  INTEGER::Q2,P2

!Just intiial value                                                                                                                                                                                                  
  DO I=1, 30000
     SUMGRM(I)=I*3
  END DO

  DO I=1, 90000
     GRI(I)=I
     H(I)=0.5*I/10000
  END DO

!Computing computer's running time (start) : for parallel programming                                                                                                                                                
 TIME1= OMP_GET_WTIME()

 DO K=1, 50000
!$OMP PARALLEL DO PRIVATE (Q2,P2,J,HSTEP2X) DEFAULT(SHARED) LASTPRIVATE(HSTEP1X)                                                                                                                                     
    DO I=2, 30000
       HSTEP1X=0.0
       Q2= SUMGRM(I-1)+1
       P2= SUMGRM(I)-1
         DO J=Q2,P2
            HSTEP2X=H(GRI(J))/0.99
            HSTEP1X=HSTEP1X+HSTEP2X
         END DO
       HSTEP2X=H(GRI(SUMGRM(I)))/0.99
       HSTEP1X=HSTEP1X+HSTEP2X
    END DO
!$OMP END PARALLEL DO                                                                                                                                                                                                
END DO

 PRINT *, 'Results  =', HSTEP1X
 PRINT *, '   '

!Computing computer's running time (finish) : for parallel programming                                                                                                                                               
 TIME2= OMP_GET_WTIME()
 PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM PARALLEL

I've done three things here:

  1. Make sure j and hstep2x are private to each thread.
  2. Explicitly declared the default behaviour to be shared (not needed here but never mind).
  3. Specified that hstep1x is lastprivate. This means that after exiting the parallel region the value of hstep1x is that taken from the thread which executed the last iteration. (see here for details).

Upvotes: 2

Related Questions