User3000
User3000

Reputation: 131

It is possible to use OpenMP on these fortran loops?

Here is my problem: I have a fortran code with a certain amount of nested loops and I would like to know if it's possible to use OpenMP to parallelize them?

Here is the fortran code:

  ia = 0
  do i=1,nO
    do a=nO+1,N
      ia = ia + 1

      jb = 0
      do j=1,nO
        do b=nO+1,N
          jb = jb + 1

          chi = 0d0
          do kc=1,nS
            eps = B(kc)**2 + eta**2
            chi = chi + A(i,j,kc)*A(a,b,kc)*B(kc)/eps
          enddo

          C(ia,jb) = C(ia,jb) + D(i,b,j,a) - 4d0*chi

        enddo
      enddo
    enddo
  enddo

Where nS = nO*(N-nO). For the dimensions of the arrays: B(nS), A(N,N,nS), C(nS,nS), D(N,N,N,N) . An example for the different integers: nO=88, N=240, nS=13376. Maybe it is worth mentioning that I run it on the cluster of my lab and not my laptop.

I know nothing about OpenMP and I tried to do it myself with examples that I found but I did not succeed. I have seen many different commands with OpenMP like PRIVATE, SHARED, REDUCTION and I tried to use them without success.

Upvotes: 0

Views: 79

Answers (1)

Yes, it should be possible. But it will also be good to rewrite the code somewhat. Notice that you can always compute the value of ia from the known values of i and a. There is no need to do the ia = ia + 1 additions. Similarly, you can always compute jb from j and b.

  do i=1,nO
    do a=nO+1,N
      do j=1,nO
        do b=nO+1,N        
          ia = (a-n0) + (i-1)*(N-n0)
          jb = (b-n0) + (j-1)*(N-n0)

          chi = 0d0
          do kc=1,nS
            eps = B(kc)**2 + eta**2
            chi = chi + A(i,j,kc)*A(a,b,kc)*B(kc)/eps
          enddo

          C(ia,jb) = C(ia,jb) + D(i,b,j,a) - 4d0*chi

        enddo
      enddo
    enddo
  enddo

As Ian Bush reminded me in the comment, the memory access pattern is bad. In Fortran, the memory order of the storage of array elements is column-major. That means that the first index should be in the innermost loop and the last index in the outermost loop for best efficiency. That can be achieved by declaring your arrays transposed or by restructuring the loop nests - which may not always be possible.

In the code above you can switch the order of the i,j,a,b loops without any change to the logic

  do j=1,nO
    do b=nO+1,N     
      do i=1,nO
        do a=nO+1,N

but it would help even more if the kc index was the first one.

If re-declaring your arrays - this may have a large impact on the run time of your program. This also concerns other parts of you whole program. You have to take the whole program into account.


After that you can apply OpenMP.

I would start with a simple:

!$omp parallel do default(private) shared(A,B,C,D,n0,N,nS)

Later, if it runs correctly, you try different schedule to find out what is faster and possibly also try collapse(4).

Upvotes: 1

Related Questions