Reputation: 131
Here is my problem: I have a fortran code with a certain amount of nested loops and I would like to know if it's possible to use OpenMP to parallelize them?
Here is the fortran code:
ia = 0
do i=1,nO
do a=nO+1,N
ia = ia + 1
jb = 0
do j=1,nO
do b=nO+1,N
jb = jb + 1
chi = 0d0
do kc=1,nS
eps = B(kc)**2 + eta**2
chi = chi + A(i,j,kc)*A(a,b,kc)*B(kc)/eps
enddo
C(ia,jb) = C(ia,jb) + D(i,b,j,a) - 4d0*chi
enddo
enddo
enddo
enddo
Where nS = nO*(N-nO). For the dimensions of the arrays: B(nS), A(N,N,nS), C(nS,nS), D(N,N,N,N) . An example for the different integers: nO=88, N=240, nS=13376. Maybe it is worth mentioning that I run it on the cluster of my lab and not my laptop.
I know nothing about OpenMP and I tried to do it myself with examples that I found but I did not succeed. I have seen many different commands with OpenMP like PRIVATE
, SHARED
, REDUCTION
and I tried to use them without success.
Upvotes: 0
Views: 79
Reputation: 60123
Yes, it should be possible. But it will also be good to rewrite the code somewhat. Notice that you can always compute the value of ia
from the known values of i
and a
. There is no need to do the ia = ia + 1
additions. Similarly, you can always compute jb
from j
and b
.
do i=1,nO
do a=nO+1,N
do j=1,nO
do b=nO+1,N
ia = (a-n0) + (i-1)*(N-n0)
jb = (b-n0) + (j-1)*(N-n0)
chi = 0d0
do kc=1,nS
eps = B(kc)**2 + eta**2
chi = chi + A(i,j,kc)*A(a,b,kc)*B(kc)/eps
enddo
C(ia,jb) = C(ia,jb) + D(i,b,j,a) - 4d0*chi
enddo
enddo
enddo
enddo
As Ian Bush reminded me in the comment, the memory access pattern is bad. In Fortran, the memory order of the storage of array elements is column-major. That means that the first index should be in the innermost loop and the last index in the outermost loop for best efficiency. That can be achieved by declaring your arrays transposed or by restructuring the loop nests - which may not always be possible.
In the code above you can switch the order of the i,j,a,b
loops without any change to the logic
do j=1,nO
do b=nO+1,N
do i=1,nO
do a=nO+1,N
but it would help even more if the kc
index was the first one.
If re-declaring your arrays - this may have a large impact on the run time of your program. This also concerns other parts of you whole program. You have to take the whole program into account.
After that you can apply OpenMP.
I would start with a simple:
!$omp parallel do default(private) shared(A,B,C,D,n0,N,nS)
Later, if it runs correctly, you try different schedule
to find out what is faster and possibly also try collapse(4)
.
Upvotes: 1