Nick Brady
Nick Brady

Reputation: 151

Why does a subroutine with an array as an input give faster performance than the same subroutine with an automatic local array?

I am re-writing some legacy code to improve readability and hopefully make it easier to maintain.

I am trying to decrease the number of input parameters for the subroutines, but I found that changing subroutine sub(N, ID) --> subroutine sub(N) had noticeably reduced performance.

ID is only used in sub, so I don't believe it makes sense to have it as an input. Is it possible to use sub(N) without taking the performance hit? (For my uses, N < 10, where the performance is 5-10x worse.)

Performance comparisons:

  1. sub_1

    • N = 4, 0.9 seconds
    • N = 20, 1.0 seconds
    • N = 200, 2.1 seconds
  2. sub_2

    • N = 4, 0.07 seconds
    • N = 20, 0.18 seconds
    • N = 200, 1.3 seconds

I am using Mac OS 10.14.6 with gfortran 5.2.0

program test
  integer, parameter  :: N = 1
  real, dimension(N)  :: ID


  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_1(N)
  end do

  call CPU_time(t2)
  write ( *, * ) 'Elapsed real time =', t2 - t1



  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_2(N, ID)
  end do

  call CPU_time(t2)
  write ( *, * ) 'Elapsed real time =', t2 - t1

end program test



SUBROUTINE sub_1(N)
  integer,            intent(in)      :: N
  real, dimension(N)                  :: ID

  ID = 0.0

END SUBROUTINE sub_1



SUBROUTINE sub_2(N, ID)
  integer,            intent(in)      :: N
  real, dimension(N), intent(in out)  :: ID

  ID = 0.0

END SUBROUTINE sub_2

Upvotes: 1

Views: 280

Answers (3)

TimK
TimK

Reputation: 4835

sub_1 and sub_2 aren't really comparable. In sub_1 you are allocating ID, initializing all of the elements and then throwing it away when the subroutine returns (because it is local to the subroutine).

Since that ID array is never used, the compiler can optimize away the creation and initialization of it. That's what gfortran does if you compile with -O3. The generated code for sub_1 does nothing but return.

In sub_2 it still has to set all of the elements of ID to 0.0.

Upvotes: 1

Ian Bush
Ian Bush

Reputation: 7434

This seems to be a "feature" of the old version of gfortran you are using. If I use later versions at least for N=10 the times are much more comparable:

ian@eris:~/work/stack$ head s.f90
program test
  integer, parameter  :: N = 10
  real, dimension(N)  :: ID


  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_1(N)
  end do
ian@eris:~/work/stack$ gfortran-5 --version
GNU Fortran (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
Copyright (C) 2015 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

ian@eris:~/work/stack$ gfortran-5 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =  0.149489999    
 Elapsed real time =   1.99675560E-06
ian@eris:~/work/stack$ gfortran-6 --version
GNU Fortran (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-6 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   7.00005330E-06
 Elapsed real time =   5.00003807E-06
ian@eris:~/work/stack$ gfortran-7 --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-7 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   8.00006092E-06
 Elapsed real time =   6.00004569E-06
ian@eris:~/work/stack$ gfortran-8 --version
GNU Fortran (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-8 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   9.00030136E-06
 Elapsed real time =   6.00004569E-06

However I would take all the above with a bucket-full of salt. It is more than likely the optimiser has worked out that it doesn't actually need to do anything in this simple case and so has just got rid of all the operations you want to time - the only benchmark that can actually tell you about this is the code you want to run.

Upvotes: 2

chw21
chw21

Reputation: 8140

I assume that it has to do with array allocation.

The process of allocating memory itself takes time. When you pass the array unaltered into the subroutine sub_2, I think it's very likely that the subroutine does not need to allocate memory for the array. This might assume that the arrays are created on the heap, not the stack, but I'm not 100% certain.

On the other hand, for the subroutine sub_1, it needs to allocate the space for the array every time anew.

I'm unfortunately not too well versed in optimisation, so I hope that other people will agree with me or tell me that I'm wrong ;)

Upvotes: 0

Related Questions