Reputation: 151
I am re-writing some legacy code to improve readability and hopefully make it easier to maintain.
I am trying to decrease the number of input parameters for the subroutines, but I found that changing
subroutine sub(N, ID)
--> subroutine sub(N)
had noticeably reduced performance.
ID
is only used in sub
, so I don't believe it makes sense to have it as an input.
Is it possible to use sub(N)
without taking the performance hit?
(For my uses, N < 10, where the performance is 5-10x worse.)
Performance comparisons:
sub_1
N = 4
, 0.9 secondsN = 20
, 1.0 secondsN = 200
, 2.1 secondssub_2
N = 4
, 0.07 secondsN = 20
, 0.18 secondsN = 200
, 1.3 secondsI am using Mac OS 10.14.6 with gfortran 5.2.0
program test
integer, parameter :: N = 1
real, dimension(N) :: ID
call CPU_time(t1)
do i = 1, 10000000
CALL sub_1(N)
end do
call CPU_time(t2)
write ( *, * ) 'Elapsed real time =', t2 - t1
call CPU_time(t1)
do i = 1, 10000000
CALL sub_2(N, ID)
end do
call CPU_time(t2)
write ( *, * ) 'Elapsed real time =', t2 - t1
end program test
SUBROUTINE sub_1(N)
integer, intent(in) :: N
real, dimension(N) :: ID
ID = 0.0
END SUBROUTINE sub_1
SUBROUTINE sub_2(N, ID)
integer, intent(in) :: N
real, dimension(N), intent(in out) :: ID
ID = 0.0
END SUBROUTINE sub_2
Upvotes: 1
Views: 280
Reputation: 4835
sub_1
and sub_2
aren't really comparable. In sub_1
you are allocating ID
, initializing all of the elements and then throwing it away when the subroutine returns (because it is local to the subroutine).
Since that ID
array is never used, the compiler can optimize away the creation and initialization of it. That's what gfortran does if you compile with -O3. The generated code for sub_1
does nothing but return.
In sub_2
it still has to set all of the elements of ID
to 0.0.
Upvotes: 1
Reputation: 7434
This seems to be a "feature" of the old version of gfortran you are using. If I use later versions at least for N=10 the times are much more comparable:
ian@eris:~/work/stack$ head s.f90
program test
integer, parameter :: N = 10
real, dimension(N) :: ID
call CPU_time(t1)
do i = 1, 10000000
CALL sub_1(N)
end do
ian@eris:~/work/stack$ gfortran-5 --version
GNU Fortran (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
Copyright (C) 2015 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
ian@eris:~/work/stack$ gfortran-5 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
Elapsed real time = 0.149489999
Elapsed real time = 1.99675560E-06
ian@eris:~/work/stack$ gfortran-6 --version
GNU Fortran (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian@eris:~/work/stack$ gfortran-6 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
Elapsed real time = 7.00005330E-06
Elapsed real time = 5.00003807E-06
ian@eris:~/work/stack$ gfortran-7 --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian@eris:~/work/stack$ gfortran-7 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
Elapsed real time = 8.00006092E-06
Elapsed real time = 6.00004569E-06
ian@eris:~/work/stack$ gfortran-8 --version
GNU Fortran (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian@eris:~/work/stack$ gfortran-8 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
Elapsed real time = 9.00030136E-06
Elapsed real time = 6.00004569E-06
However I would take all the above with a bucket-full of salt. It is more than likely the optimiser has worked out that it doesn't actually need to do anything in this simple case and so has just got rid of all the operations you want to time - the only benchmark that can actually tell you about this is the code you want to run.
Upvotes: 2
Reputation: 8140
I assume that it has to do with array allocation.
The process of allocating memory itself takes time. When you pass the array unaltered into the subroutine sub_2
, I think it's very likely that the subroutine does not need to allocate memory for the array. This might assume that the arrays are created on the heap, not the stack, but I'm not 100% certain.
On the other hand, for the subroutine sub_1
, it needs to allocate the space for the array every time anew.
I'm unfortunately not too well versed in optimisation, so I hope that other people will agree with me or tell me that I'm wrong ;)
Upvotes: 0