Reputation: 1316
What are the recommendations or best practices regarding where should we allocate an array?
For instance, if I have a (simplified version of my) program as shown, I am allocating the output variable (the variable of interest) in the main program. This main program calls subroutine foo
, which, in turn, calls subroutine foo2
, who does the actual calculations.
My question is what is the best/recommended practice to where the allocation should be done.
foo2
does the actual calculation, should it allocate the arrays?foo
calls foo2
, should foo
allocate the array and foo2
do
just the calculations?If it is important, I have a module called global, that contains the derived types on main program, and the main parameters of the code, such as the size of each array (Ni
, Nj
, tolerances etc)
program main
use global
implicit none
type(myVar_) :: ans
Ni = 10
Nj = 20
if (allocated(ans%P)) deallocate(ans%P)
allocate(ans%P(1:Ni, 1:Nj))
call foo(ans)
print *, P
end program main
module global
integer, parameter :: dp=kind(0.d0)
integer :: Ni, Nj
type myVar_
real(dp), allocatable :: P(:,:)
end type myVar_
end module global
subroutine foo(myVar)
use global
implicit none
type(myVar_) :: myVar
call foo2(myVar%P)
end subroutine
subroutine foo2(P)
use global
implicit none
real(dp), intent(inout) :: P(:,:)
! do calculations for P
end subroutine foo2
what is
Upvotes: 2
Views: 413
Reputation: 1905
It is indeed good practice to avoid allocation in low-level subroutines and function for performance reason. As you can see from [1], simple additions take about 1-3 CPU cycles, an allocation and deallocation pair (of a "small" array) can take between 200-500 CPU cycles.
I would suggest you to write a subroutine using a "work" variable as input and possibly operating in place (i.e. overriding the input with the result), e.g.
subroutine do_computation(input,output,work1,work2)
work1 = ...
work2 = ...
output = ...
end subroutine
An you could make a wrapper function which makes the allocation for convenience:
subroutine convenient_subroutine(input,output)
allocate(work1(...),work2(...)
call do_computation(input,output,work1,work2)
deallocate(work1,work2)
end subroutine
When performance is not critical, you can call the convenient_subroutine
, but otherwise you call do_computation
trying to share the work arrays between loop iteration and between different other subroutines.
[1] http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
Upvotes: 2