Reputation: 151
I would like to use OpenMP for this single thread code:
PROGRAM SINGLE
INTEGER, DIMENSION(30000)::SUMGRM
INTEGER, DIMENSION(90000)::GRI,H
REAL*8::HSTEP1X,HSTEP2X
REAL*8::TIME1,TIME2
!Just intiial value
DO I=1, 30000
SUMGRM(I)=I*3
END DO
DO I=1, 90000
GRI(I)=I
H(I)=0.5*I/10000
END DO
!Computing computer's running time (start) : for serial programming
CALL CPU_TIME(TIME1)
DO K=1, 50000
DO I=2, 30000
HSTEP1X=0.0
DO J=SUMGRM(I-1)+1, SUMGRM(I)-1
HSTEP2X=H(GRI(J))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
HSTEP2X=H(GRI(SUMGRM(I)))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
END DO
PRINT *, 'Results =', HSTEP1X
PRINT *, ' '
!Computing computer's running time (finish) : for serial programming
CALL CPU_TIME(TIME2)
PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM SINGLE
As you can see, the main problem is located at the most inner side looping (J
) which is also a function of most outer side looping (I
). I've tried to parallelize this program like this:
PROGRAM PARALLEL
INTEGER, DIMENSION(30000)::SUMGRM
INTEGER, DIMENSION(90000)::GRI,H
REAL*8::HSTEP1X,HSTEP2X
REAL*8::TIME1,TIME2,OMP_GET_WTIME
INTEGER::Q2,P2
!Just intiial value
DO I=1, 30000
SUMGRM(I)=I*3
END DO
DO I=1, 90000
GRI(I)=I
H(I)=0.5*I/10000
END DO
!Computing computer's running time (start) : for parallel programming
TIME1= OMP_GET_WTIME()
DO K=1, 50000
!$OMP PARALLEL DO PRIVATE (HSTEP1X,Q2,P2)
DO I=2, 30000
HSTEP1X=0.0
Q2=SUMGRM(I-1)+1
P2=SUMGRM(I)-1
DO J=Q2, P2
HSTEP2X=H(GRI(J))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
HSTEP2X=H(GRI(SUMGRM(I)))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
!$OMP END PARALLEL DO
END DO
PRINT *, 'Results =', HSTEP1X
PRINT *, ' '
!Computing computer's running time (finish) : for parallel programming
TIME2= OMP_GET_WTIME()
PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM PARALLEL
I'm using gfortran with -O3 -fopenmp
and then export OMP_NUM_THREADS=...
The parallel program runs faster but the result is different with the single thread code. By the serial program I got 12.1212
(which it is the correct one) and by parallel I got 0.000
(there must be something wrong).
What did I do wrong?
Upvotes: 0
Views: 404
Reputation: 51
Have you tried using
!$OMP PARALLEL DO DEFAULT(PRIVATE) REDUCTION(+:HSTEP1X)
Upvotes: 0
Reputation: 854
Firstly we can note that by default you're likely to find that both j
and hstep2x
are going to be shared between threads. I don't think this is really what you want as it will lead to some very odd behaviour were multiple threads are using the same iteration index but are trying to loop over different ranges.
Next let's note that your serial code actually just prints the result for the i=30000
iteration as the value of hstep1x
is reset to 0 at the start of each iteration. As such to get the "correct" answer in the openmp code we could just focus on reproducing the final iteration -- this completely negates the point of using openmp here I think. I'm guessing this is just a simple case you're trying to use to represent your real problem -- I think you may have missed some of the real problem in producing this.
Nevertheless the below code produces the "correct" answer on my machine. I'm not sure how flexible it is but it works here.
PROGRAM PARALLEL
INTEGER, DIMENSION(30000)::SUMGRM
INTEGER, DIMENSION(90000)::GRI,H
REAL*8::HSTEP1X,HSTEP2X
REAL*8::TIME1,TIME2,OMP_GET_WTIME
INTEGER::Q2,P2
!Just intiial value
DO I=1, 30000
SUMGRM(I)=I*3
END DO
DO I=1, 90000
GRI(I)=I
H(I)=0.5*I/10000
END DO
!Computing computer's running time (start) : for parallel programming
TIME1= OMP_GET_WTIME()
DO K=1, 50000
!$OMP PARALLEL DO PRIVATE (Q2,P2,J,HSTEP2X) DEFAULT(SHARED) LASTPRIVATE(HSTEP1X)
DO I=2, 30000
HSTEP1X=0.0
Q2= SUMGRM(I-1)+1
P2= SUMGRM(I)-1
DO J=Q2,P2
HSTEP2X=H(GRI(J))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
HSTEP2X=H(GRI(SUMGRM(I)))/0.99
HSTEP1X=HSTEP1X+HSTEP2X
END DO
!$OMP END PARALLEL DO
END DO
PRINT *, 'Results =', HSTEP1X
PRINT *, ' '
!Computing computer's running time (finish) : for parallel programming
TIME2= OMP_GET_WTIME()
PRINT *, 'Elapsed real time = ', TIME2-TIME1, 'second(s)'
END PROGRAM PARALLEL
I've done three things here:
j
and hstep2x
are private to each thread.hstep1x
is lastprivate
. This means that after exiting the parallel region the value of hstep1x
is that taken from the thread which executed the last iteration. (see here for details).Upvotes: 2