Kyum
Kyum

Reputation: 45

fortran Do loop index issue for optimize code

First of all my english is not good. I'm Sorry.

As far as I know. Fortran address is column major. My old Fortran code is not optimized for long time. I try to change my Fortran90 code index for better speed.

A code is almost 3-dimension matrix. (i, j, k) and almost Do-loop is about i and j. sizes of i and j are about 2000~3000 and k is just 2, it means x,y

my old code's index order is (i, k, j)

for example

Do j = 1 : 1500
    Do i = 1 : 1024
        AA(i, 1, j) = ... ;
        AA(i, 2, j) = ... ;
    end do
end do

There are a lot of these in my code.

So I changed the index order. for example (i, j, k), (k, i, j), (i, k, j) I think (k, i, j) is the best choice in fortran (column major).

but result is not.

all 3 case [ (i, j, k), (k, i, j), (i, k, j) ] are spend almost time. (1961s, 1955s, 1692s).

My program code is so long and Iteration is enough to compare ( 32000 )

Below is my compile option.

ifort -O3 -xHost -ipo -qopenmp -fp-model strict -mcmodel=medium

I don't understand above result. Please help me.

Thanks to read.

additionaly, below is one of my programs. matrix L_X(i, :, j) is my target, : is 1 and 2

!$OMP Parallel DO private(j,i,ii,Tan,NormT)

do j=1,LinkPlusBndry
  if (Kmax(j)>2) then
     i=1; Tan=L_X(i+1,:,j)-L_X(i,:,j); NormT=sqrt(Tan(1)**2+Tan(2)**2)
     if (NormT < min_dist) then
        L_X(2:Kmax(j)-1,:,j)=L_X(3:Kmax(j),:,j)
        Kmax(j)=Kmax(j)-1
     elseif (NormT > max_dist) then
        do i=Kmax(j)+1,3,-1; L_X(i,:,j)=L_X(i-1,:,j); end do
        L_X(2,:,j)=(L_X(1,:,j)+L_X(3,:,j))/2.0_dp
        Kmax(j)=Kmax(j)+1
     end if
     do i=2,M-1
       if (i > (Kmax(j)-2) ) exit
       Tan=L_X(i+1,:,j)-L_X(i,:,j); NormT=sqrt(Tan(1)**2+Tan(2)**2)
       if (NormT < min_dist) then
         L_X(i,:,j)=(L_X(i,:,j)+L_X(i+1,:,j))/2.0_dp
         L_X(i+1:Kmax(j)-1,:,j)=L_X(i+2:Kmax(j),:,j)
         Kmax(j)=Kmax(j)-1
       elseif (NormT > max_dist) then
         do ii=Kmax(j)+1,i+2,-1; L_X(ii,:,j)= L_X(ii-1,:,j); end do
         L_X(i+1,:,j)=(L_X(i,:,j)+L_X(i+2,:,j))/2.0_dp
         Kmax(j)=Kmax(j)+1
       end if
     end do
     i=Kmax(j)-1;
     if (i>1) then
       Tan=L_X(i+1,:,j)-L_X(i,:,j); NormT=sqrt(Tan(1)**2+Tan(2)**2)
       if (NormT < min_dist) then
         L_X(Kmax(j)-1,:,j)=L_X(Kmax(j),:,j)
         Kmax(j)=Kmax(j)-1
       elseif (NormT > max_dist) then
         L_X(Kmax(j)+1,:,j)= L_X(Kmax(j),:,j)
         L_X(Kmax(j),:,j)=(L_X(Kmax(j)-1,:,j)+L_X(Kmax(j)+1,:,j))/2.0_dp
          Kmax(j)=Kmax(j)+1
       end if
     end if
  elseif (Kmax(j)==2) then
     i=1; Tan=L_X(i+1,:,j)-L_X(i,:,j); NormT=sqrt(Tan(1)**2+Tan(2)**2)
     if (NormT > max_dist) then
        do i=Kmax(j)+1,3,-1; L_X(i,:,j)=L_X(i-1,:,j); end do
        L_X(2,:,j)=(L_X(1,:,j)+L_X(3,:,j))/2.0_dp
        Kmax(j)=Kmax(j)+1
     end if
  end if
  do i=Kmax(j)+1,M; L_X(i,:,j)=L_X(Kmax(j),:,j); end do
end do

!$OMP End Parallel DO

Upvotes: 0

Views: 326

Answers (1)

Dan Sp.
Dan Sp.

Reputation: 1447

I would not worry so much about loop ordering. ifort -O3 optimization is an aggressive loop optimizer. Its possible that reordering your 3-D arrays will have little to no affect.

As far as you thinking (k,i,j) is the best order. In general this would be best. But k only has 2 elements and i has 1024. Assuming you are using single precision real (4 bytes) This 2-D segment of your 3-D array fit in 8K ram. It is likely that your data, once the loop starts, is entirely on the CPU cache so index ordering would be irrelevant. You need much larger data dimensions for the effect your considering to take effect.

As far as your performance difference, that is likely the struggles of compiler optimizations.

Upvotes: 2

Related Questions