Reputation: 45
The function below is used in my program the most.
CONTAINS
SUBROUTINE Delta4(R,Del)
REAL(DP), DIMENSION(:), INTENT(IN) :: R
REAL(DP), DIMENSION(:), INTENT(OUT) :: Del
INTEGER(I4B) :: i, j, r_n, c_n
REAL(DP) :: ar
!r_n=size(R,1);
Del=0.0_dp
do i=1,4; ar=ABS(R(i))
if (ar <= 1.0_dp) then
Del(i)=(3.0_dp-2.0_dp*ar+ &
sqrt(1.0_dp+4.0_dp*ar-4.0_dp*ar*ar))*0.125_dp
else !! if (1.0_dp < ar .and. ar <= 2.0_dp) then
Del(i)=(5.0_dp-2.0_dp*ar- &
sqrt(-7.0_dp+12.0_dp*ar-4.0_dp*ar*ar))*0.125_dp
end if
end do
R, Del
: length 4 vector
So I want to improve the speed of this function.
I as far as know, the if-else branch is slow. Furthermore, it is in a do loop.
How can I optimize it?
Upvotes: 1
Views: 757
Reputation:
IMO there is really little that you can gain on this function, which is essentially a bunch of arithmetic operations.
You can absorb the *0.125_dp
in the other constants.
Better, you can rewrite the computation as (pseudocode)
ar= 1 + 2 * ((ar > 1) - ar)
Del= (2 + ar + sqrt(1 - ar * ar)) * 0.125
This assumes an implicit conversion of .false.
to 0
and .true.
to 1
, which may not hold with your compiler. Hopefully, it is compiled as branchless.
As the vector length is just four, you can completely unroll the loop (but maybe the compiler already does it).
My bet is that won't make a visible difference.
To improve performance, you should review the code from a more global perspective, and maybe consider parallelization.
Update:
As pointed by @kvantour, I dropped the change of sign. We can fix with
i= 2 * (ar > 1) - 1
ar= i + 2 * (1 - ar)
Del= (2 + ar + i * sqrt(1 - ar * ar)) * 0.125
Alternatively,
i= SIGN(1, ar - 1)
if that turns out to be efficient.
Upvotes: 2