Optimisation of scientific code with constant flags set by input file

Question

What is the best way to prevent repeated conditional evaluation of logical values which do not change during the run, but must be specified at runtime?

The application is scientific computing, which involves a large code that reads a range of inputs. The code then runs for days, weeks or even months with these same input values. Some of these inputs are flags which turn on certain features or adjust the calculation methodology. An example would be:

do i = 1, N
do j = 1, M

    !Some calculation
    calculated_value = ...

    !Flags specify how to use or adjust the calculated_value
    if (flag1) then
        calculated_value = calculated_value  + 1
    endif

    if (flag2) then
        call save_value(calculated_value)
    endif

    if (flag3) ...

end do
end do

The flags must be inside the loop as the features they turn on use the data obtained within the loop. However, the flags must be evaluated every loop and this becomes less and less efficient as the number of flags grows. Some possible solutions that I'm considering include:

Parse the input file (e.g. with python/bash), generate a file of parameters and include these in the compiled code.
Profile guided compiler optimisation (although in my experience, this typically performs worse than aggressive static flags).
Fortran protected module to provide hint to compiler these values won't change (does this work?).
Some use of function pointers or objects to changed what is calculated each time.
Entirely separate (but almost identical) subroutines for each combination of flags.

I remember hearing that conditional statements are typically assumed to be their previous value and a check is only performed at the end. Is this is the case that perhaps using fixed flags is not a concern for efficiency. This must be a common problem in numerical computation but I cannot find a good discussion/solution on google.

EDIT: Added code to time the no flags, parameter flags, variable flags and @Alexander Vogt flags to define a choice of routines.

!Module of all permatations of flag conditions
module all_variants

contains

subroutine loop_Flag1_Flag2_Flag3(M,N,a,rand)
    implicit none
    integer, intent(in) :: M, N
    double precision, dimension(:),allocatable, intent(in)   :: rand
    double precision, intent(inout)   :: a

    integer    :: i,j

#define COND_FLAG1
#define COND_FLAG2
#define COND_FLAG3

#include "common_code.inc.F90"

end subroutine loop_Flag1_Flag2_Flag3

subroutine loop_Flag1_Flag2_nFlag3(M,N,a,rand)
    implicit none
    integer, intent(in) :: M, N
    double precision, dimension(:),allocatable, intent(in)   :: rand
    double precision, intent(inout)   :: a

    integer    :: i,j

#define COND_FLAG1
#define COND_FLAG2
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif

#include "common_code.inc.F90"

end subroutine loop_Flag1_Flag2_nFlag3

subroutine loop_Flag1_nFlag2_nFlag3(M,N,a,rand)
    implicit none
    integer, intent(in) :: M, N
    double precision, dimension(:),allocatable, intent(in)   :: rand
    double precision, intent(inout)   :: a

    integer    :: i,j

#define COND_FLAG1
#ifdef COND_FLAG2
#undef COND_FLAG2
#endif
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif

#include "common_code.inc.F90"

end subroutine loop_Flag1_nFlag2_nFlag3

subroutine loop_nFlag1_nFlag2_nFlag3(M,N,a,rand)
    implicit none
    integer, intent(in) :: M, N
    double precision, dimension(:),allocatable, intent(in)   :: rand
    double precision, intent(inout)   :: a

    integer    :: i,j

#ifdef COND_FLAG1
#undef COND_FLAG1
#endif
#ifdef COND_FLAG2
#undef COND_FLAG2
#endif
#ifdef COND_FLAG3
#undef COND_FLAG3
#endif

#include "common_code.inc.F90"

end subroutine loop_nFlag1_nFlag2_nFlag3

end module all_variants

!Some generic subroutine
subroutine write_a(a)
    implicit none

    double precision,intent(in) :: a

    print*, a

end subroutine write_a

!Main program to time various flag options

program optimise_flags
    use all_variants
    implicit none

    logical             :: flag1, flag2, flag3
    logical,parameter   :: pflag1 = .false., pflag2=.false., pflag3=.false.
    integer             :: i,j, N,M, rep, repeats
    double precision    :: a, t1,t2
    double precision    :: tnf, tpf, tvf, tppf
    double precision    :: anf, apf, avf, appf
    double precision, dimension(:),allocatable   :: rand

    !Number of runs and zero counters
    N = 1000; M = 1000; repeats = 1000
    allocate(rand(N*M))
    tnf = 0.d0; tpf = 0.d0; tvf = 0.d0; tppf = 0.d0
    anf = 0.d0; apf = 0.d0; avf = 0.d0; appf = 0.d0

    !Setup variable inputs
    open(unit=10,file='./input')
    read(10,*) flag1
    read(10,*) flag2
    read(10,*) flag3
    close(unit=10,status='keep')

    !Main loop
    do rep = 1, repeats

        !Generate array of random numbers
        !call reset_seed()
        call random_number(rand(:))

        !vvvvvvv Run with no flags vvvvvv
        a = 0.d0
        call cpu_time(t1)
        do i = 1,N
        do j = 1,M
            a = a + rand(j+(i-1)*M)
        enddo
        enddo
        call cpu_time(t2)
        anf = anf + a
        tnf = tnf + t2-t1
        !^^^^^^^ Run with no flags ^^^^^^

        !vvvvvvv Run with parameter flags vvvvvv
        a = 0.d0
        call cpu_time(t1)
        do i = 1,N
        do j = 1,M
            a = a + rand(j+(i-1)*M)

            if (pflag1) a = a + 1.d0
            if (pflag2) call write_a(a)
            if (pflag3) a = a**3.d0
        enddo
        enddo
        call cpu_time(t2)
        apf = apf + a
        tpf = tpf + t2-t1
        !^^^^^^^ Run with parameter flags ^^^^^^

        !vvvvvvv Run with variable input flags vvvvvvv
        a = 0.d0
        call cpu_time(t1)
        do i = 1,N
        do j = 1,M
            a = a + rand(j+(i-1)*M)

            if (flag1) a = a + 1.d0
            if (flag2) call write_a(a)
            if (flag3) a = a**3.d0
        enddo
        enddo
        call cpu_time(t2)
        avf = avf + a
        tvf = tvf + t2-t1
        ! ^^^^^^ Run with variable input flags  ^^^^^^

        ! vvvvvvv Run with copied subroutines flags vvvvvvv
        a = 0.d0
        call cpu_time(t1)
        !Choose a subroutine using pre-defined flags
        if ( flag1 ) then
          if ( flag2 ) then
            if ( flag3 ) then
              call loop_Flag1_Flag2_Flag3(M,N,a,rand)
            else
              call loop_Flag1_Flag2_nFlag3(M,N,a,rand)
            endif
          else
              call loop_Flag1_nFlag2_nFlag3(M,N,a,rand)
          endif
        else
            call loop_nFlag1_nFlag2_nFlag3(M,N,a,rand)
        endif
        call cpu_time(t2)
        appf = appf + a
        tppf = tppf + t2-t1
        ! ^^^^^^^ Run with copied subroutines flags ^^^^^^^

    enddo

    print'(4(a,e14.7))', 'Results: for no flag = ', anf,  ' Param flag = ', apf, ' Variable flag = ', avf, ' Pre-proc =', appf
    print'(4(a,f14.7))', 'Timings: for no flag = ', tnf,  ' Param flag = ', tpf, ' Variable flag = ', tvf, ' Pre-proc =', tppf

end program optimise_flags

With an input file containing:

.false.
.false.
.false.

My timing results vary depending on optimisation flags and compiler, typically: For ifort -fpp -O3 -xHost -ipo -fast optimise_flags.f90

no flag = 0.2499380
Param flag = 0.2427720
Variable flag = 0.9796880
@Alexander Vogt multi-subroutines = 0.2427100

For gfortran -cpp -O3 optimise_flags.f90

no flag = 0.8855360
Param flag = 0.8882080
Variable flag = 0.9222320
@Alexander Vogt multi-subroutines = 0.8848810

The conclusion is that using variable flags do result in a performance penalty and that the solution proposed by @Alexander Vogt works.

Alexander Vogt · Accepted Answer

As far as I know, these flags are a concern, especially if the compiler can't easily optimize them away. My best guess would be to separate the subroutines if performance is critical. Below I sketch a scheme how you could implement that without code duplication. Whether that speeds up your code or not depends on the actual code and the complexity of the loop and the conditionals, so you need to try that out to see whether it really is worth the effort.

You can realize the last option you mentioned (separate (but almost identical) subroutines) efficiently with #include to avoid code duplication:

common_code.inc.F90:

do i = 1, N
do j = 1, M

    !Some calculation
    calculated_value = ...

    !Flags specify how to use or adjust the calculated_value
    #ifdef COND_FLAG1
        calculated_value = calculated_value  + 1
    #endif

    #ifdef COND_FLAG2
        call save_value(calculated_value)
    #endif

    #ifdef COND_FLAG3
    !...
    #endif

end do
end do

Individual subroutines:

module all_variants

contains

  subroutine loop_Flag1_nFlag2_nFlag3()
    ! ...
    #define COND_FLAG1

    #ifdef COND_FLAG2
    #undef COND_FLAG2
    #endif

    #ifdef COND_FLAG3
    #undef COND_FLAG3
    #endif

    #include "common_code.inc.F90"
  end subroutine

  ! ...
end module

Then you need to treat all cases:

if ( flag1 ) then
  if ( flag2 ) then
    if ( flag3 ) then
      call loop_Flag1_Flag2_Flag3()
    else
      call loop_Flag1_Flag2_nFlag3()
    endif
  else
    ! ...
  endif
else
  ! ...
endif

Optimisation of scientific code with constant flags set by input file

Answers (1)

Related Questions