Reputation: 181
I am a new user of openmp. I have written the following code in fortran and tried to add parallel feature to it using openmp. Unfortunately, it is taking same time as serial version of this subroutine. I am compiling it using this f2py command. Am sure, I am missing a key concept here but unable to figure it out. Will really appreciate on getting help on this.
!f2py -c --opt='-O3' --f90flags='-fopenmp' -lgomp -m g3Test g3TestA.f90
exp1 =0.0
exp2 =0.0
exp3 =0.0
!$OMP PARALLEL DO shared(xConfig,s1,s2,s3,c1,c2,c3) private(h)&
!$OMP REDUCTION(+:exp1,exp2,exp3)
do k=0,numRows-1
xConfig(0:2) = X(k,0:2)
do h=0,nPhi-1
exp1(h) = exp1(h)+exp(-((xConfig(0)-c1(h))**2)*s1)
exp2(h) = exp2(h)+exp(-((xConfig(1)-c2(h))**2)*s2)
exp3(h) = exp3(h)+exp(-((xConfig(2)-c3(h))**2)*s3)
end do
end do
!$OMP END PARALLEL DO
ALine = exp1+exp2+exp3
Upvotes: 0
Views: 244
Reputation: 1263
As neatly explained in this OpenMP Performance training course material from the University of Edinburgh for example, there are a number of reasons why OpenMP code does not necessarily scale as you would expect (for example how much of the serial runtime is taken by the part you are parallelising, synchronisation between threads, communication, and other parallel overheads).
You can easily test the performance with different numbers of threads by calling your python script like, e.g. with 2 threads:
env OMP_NUM_THREADS=2 python <your script name>
and you may consider adding the following lines in your code example to get a visual confirmation of the number of threads being used in the OpenMP part of your code:
do k=0,numRows-1
!this if-statement is only for debugging, remove for timing
!$ if (k==0) then
!$ print *, 'num_threads running:', OMP_get_num_threads()
!$ end if
xConfig(0:2) = X(k,0:2)
Upvotes: 2