Reputation: 13206
What is the best way to prevent repeated conditional evaluation of logical values which do not change during the run, but must be specified at runtime?
The application is scientific computing, which involves a large code that reads a range of inputs. The code then runs for days, weeks or even months with these same input values. Some of these inputs are flags which turn on certain features or adjust the calculation methodology. An example would be:
do i = 1, N
do j = 1, M
!Some calculation
calculated_value = ...
!Flags specify how to use or adjust the calculated_value
if (flag1) then
calculated_value = calculated_value + 1
endif
if (flag2) then
call save_value(calculated_value)
endif
if (flag3) ...
end do
end do
The flags must be inside the loop as the features they turn on use the data obtained within the loop. However, the flags must be evaluated every loop and this becomes less and less efficient as the number of flags grows. Some possible solutions that I'm considering include:
I remember hearing that conditional statements are typically assumed to be their previous value and a check is only performed at the end. Is this is the case that perhaps using fixed flags is not a concern for efficiency. This must be a common problem in numerical computation but I cannot find a good discussion/solution on google.
EDIT: Added code to time the no flags, parameter flags, variable flags and @Alexander Vogt flags to define a choice of routines.
!Module of all permatations of flag conditions
module all_variants
contains
subroutine loop_Flag1_Flag2_Flag3(M,N,a,rand)
implicit none
integer, intent(in) :: M, N
double precision, dimension(:),allocatable, intent(in) :: rand
double precision, intent(inout) :: a
integer :: i,j
#define COND_FLAG1
#define COND_FLAG2
#define COND_FLAG3
#include "common_code.inc.F90"
end subroutine loop_Flag1_Flag2_Flag3
subroutine loop_Flag1_Flag2_nFlag3(M,N,a,rand)
implicit none
integer, intent(in) :: M, N
double precision, dimension(:),allocatable, intent(in) :: rand
double precision, intent(inout) :: a
integer :: i,j
#define COND_FLAG1
#define COND_FLAG2
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif
#include "common_code.inc.F90"
end subroutine loop_Flag1_Flag2_nFlag3
subroutine loop_Flag1_nFlag2_nFlag3(M,N,a,rand)
implicit none
integer, intent(in) :: M, N
double precision, dimension(:),allocatable, intent(in) :: rand
double precision, intent(inout) :: a
integer :: i,j
#define COND_FLAG1
#ifdef COND_FLAG2
#undef COND_FLAG2
#endif
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif
#include "common_code.inc.F90"
end subroutine loop_Flag1_nFlag2_nFlag3
subroutine loop_nFlag1_nFlag2_nFlag3(M,N,a,rand)
implicit none
integer, intent(in) :: M, N
double precision, dimension(:),allocatable, intent(in) :: rand
double precision, intent(inout) :: a
integer :: i,j
#ifdef COND_FLAG1
#undef COND_FLAG1
#endif
#ifdef COND_FLAG2
#undef COND_FLAG2
#endif
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif
#include "common_code.inc.F90"
end subroutine loop_nFlag1_nFlag2_nFlag3
end module all_variants
!Some generic subroutine
subroutine write_a(a)
implicit none
double precision,intent(in) :: a
print*, a
end subroutine write_a
!Main program to time various flag options
program optimise_flags
use all_variants
implicit none
logical :: flag1, flag2, flag3
logical,parameter :: pflag1 = .false., pflag2=.false., pflag3=.false.
integer :: i,j, N,M, rep, repeats
double precision :: a, t1,t2
double precision :: tnf, tpf, tvf, tppf
double precision :: anf, apf, avf, appf
double precision, dimension(:),allocatable :: rand
!Number of runs and zero counters
N = 1000; M = 1000; repeats = 1000
allocate(rand(N*M))
tnf = 0.d0; tpf = 0.d0; tvf = 0.d0; tppf = 0.d0
anf = 0.d0; apf = 0.d0; avf = 0.d0; appf = 0.d0
!Setup variable inputs
open(unit=10,file='./input')
read(10,*) flag1
read(10,*) flag2
read(10,*) flag3
close(unit=10,status='keep')
!Main loop
do rep = 1, repeats
!Generate array of random numbers
!call reset_seed()
call random_number(rand(:))
!vvvvvvv Run with no flags vvvvvv
a = 0.d0
call cpu_time(t1)
do i = 1,N
do j = 1,M
a = a + rand(j+(i-1)*M)
enddo
enddo
call cpu_time(t2)
anf = anf + a
tnf = tnf + t2-t1
!^^^^^^^ Run with no flags ^^^^^^
!vvvvvvv Run with parameter flags vvvvvv
a = 0.d0
call cpu_time(t1)
do i = 1,N
do j = 1,M
a = a + rand(j+(i-1)*M)
if (pflag1) a = a + 1.d0
if (pflag2) call write_a(a)
if (pflag3) a = a**3.d0
enddo
enddo
call cpu_time(t2)
apf = apf + a
tpf = tpf + t2-t1
!^^^^^^^ Run with parameter flags ^^^^^^
!vvvvvvv Run with variable input flags vvvvvvv
a = 0.d0
call cpu_time(t1)
do i = 1,N
do j = 1,M
a = a + rand(j+(i-1)*M)
if (flag1) a = a + 1.d0
if (flag2) call write_a(a)
if (flag3) a = a**3.d0
enddo
enddo
call cpu_time(t2)
avf = avf + a
tvf = tvf + t2-t1
! ^^^^^^ Run with variable input flags ^^^^^^
! vvvvvvv Run with copied subroutines flags vvvvvvv
a = 0.d0
call cpu_time(t1)
!Choose a subroutine using pre-defined flags
if ( flag1 ) then
if ( flag2 ) then
if ( flag3 ) then
call loop_Flag1_Flag2_Flag3(M,N,a,rand)
else
call loop_Flag1_Flag2_nFlag3(M,N,a,rand)
endif
else
call loop_Flag1_nFlag2_nFlag3(M,N,a,rand)
endif
else
call loop_nFlag1_nFlag2_nFlag3(M,N,a,rand)
endif
call cpu_time(t2)
appf = appf + a
tppf = tppf + t2-t1
! ^^^^^^^ Run with copied subroutines flags ^^^^^^^
enddo
print'(4(a,e14.7))', 'Results: for no flag = ', anf, ' Param flag = ', apf, ' Variable flag = ', avf, ' Pre-proc =', appf
print'(4(a,f14.7))', 'Timings: for no flag = ', tnf, ' Param flag = ', tpf, ' Variable flag = ', tvf, ' Pre-proc =', tppf
end program optimise_flags
With an input file containing:
.false.
.false.
.false.
My timing results vary depending on optimisation flags and compiler, typically:
For ifort -fpp -O3 -xHost -ipo -fast optimise_flags.f90
For gfortran -cpp -O3 optimise_flags.f90
The conclusion is that using variable flags do result in a performance penalty and that the solution proposed by @Alexander Vogt works.
Upvotes: 1
Views: 146
Reputation: 18098
As far as I know, these flags are a concern, especially if the compiler can't easily optimize them away. My best guess would be to separate the subroutines if performance is critical. Below I sketch a scheme how you could implement that without code duplication. Whether that speeds up your code or not depends on the actual code and the complexity of the loop and the conditionals, so you need to try that out to see whether it really is worth the effort.
You can realize the last option you mentioned (separate (but almost identical) subroutines) efficiently with #include
to avoid code duplication:
common_code.inc.F90:
do i = 1, N
do j = 1, M
!Some calculation
calculated_value = ...
!Flags specify how to use or adjust the calculated_value
#ifdef COND_FLAG1
calculated_value = calculated_value + 1
#endif
#ifdef COND_FLAG2
call save_value(calculated_value)
#endif
#ifdef COND_FLAG3
!...
#endif
end do
end do
Individual subroutines:
module all_variants
contains
subroutine loop_Flag1_nFlag2_nFlag3()
! ...
#define COND_FLAG1
#ifdef COND_FLAG2
#undef COND_FLAG2
#endif
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif
#include "common_code.inc.F90"
end subroutine
! ...
end module
Then you need to treat all cases:
if ( flag1 ) then
if ( flag2 ) then
if ( flag3 ) then
call loop_Flag1_Flag2_Flag3()
else
call loop_Flag1_Flag2_nFlag3()
endif
else
! ...
endif
else
! ...
endif
Upvotes: 1