Thales
Thales

Reputation: 1316

recommended practices to place allocation of arrays in Fortran

What are the recommendations or best practices regarding where should we allocate an array?

For instance, if I have a (simplified version of my) program as shown, I am allocating the output variable (the variable of interest) in the main program. This main program calls subroutine foo, which, in turn, calls subroutine foo2, who does the actual calculations. My question is what is the best/recommended practice to where the allocation should be done.

If it is important, I have a module called global, that contains the derived types on main program, and the main parameters of the code, such as the size of each array (Ni, Nj, tolerances etc)

program main
    use global
    implicit none

    type(myVar_) :: ans

    Ni = 10
    Nj = 20

    if (allocated(ans%P)) deallocate(ans%P)
    allocate(ans%P(1:Ni, 1:Nj))

    call foo(ans)

    print *, P
end program main

module global
    integer, parameter :: dp=kind(0.d0)

    integer :: Ni, Nj

    type myVar_
        real(dp), allocatable :: P(:,:)
    end type myVar_

end module global

subroutine foo(myVar)
    use global
    implicit none

    type(myVar_) :: myVar

    call foo2(myVar%P)

end subroutine

subroutine foo2(P)
    use global
    implicit none

    real(dp), intent(inout) :: P(:,:)

    ! do calculations for P
end subroutine foo2

what is

Upvotes: 2

Views: 413

Answers (1)

Alex338207
Alex338207

Reputation: 1905

It is indeed good practice to avoid allocation in low-level subroutines and function for performance reason. As you can see from [1], simple additions take about 1-3 CPU cycles, an allocation and deallocation pair (of a "small" array) can take between 200-500 CPU cycles.

I would suggest you to write a subroutine using a "work" variable as input and possibly operating in place (i.e. overriding the input with the result), e.g.

subroutine do_computation(input,output,work1,work2)
   work1 = ...
   work2 = ...
   output = ...
end subroutine

An you could make a wrapper function which makes the allocation for convenience:

subroutine convenient_subroutine(input,output)
   allocate(work1(...),work2(...)
   call do_computation(input,output,work1,work2)
   deallocate(work1,work2)
end subroutine

When performance is not critical, you can call the convenient_subroutine, but otherwise you call do_computation trying to share the work arrays between loop iteration and between different other subroutines.

[1] http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

Upvotes: 2

Related Questions