Reputation: 153
I am doing some long-term simulations in which I'm trying to achieve the highest possible accuracy in the solution of systems of ODEs. I am trying to find out how much time quadruple (128-bit) precision calculations take versus double (64-bit) precision. I googled around a bit and saw several opinions about it: some say it will take 4 times longer, others 60-70 times... So I decided to take matters in my own hands and I wrote a simple Fortran benchmark program:
program QUAD_TEST
implicit none
integer,parameter :: dp = selected_int_kind(15)
integer,parameter :: qp = selected_int_kind(33)
integer :: cstart_dp,cend_dp,cstart_qp,cend_qp,crate
real :: time_dp,time_qp
real(dp) :: sum_dp,sqrt_dp,pi_dp,mone_dp,zero_dp
real(qp) :: sum_qp,sqrt_qp,pi_qp,mone_qp,zero_qp
integer :: i
! ==============================================================================
! == TEST 1. ELEMENTARY OPERATIONS ==
sum_dp = 1._dp
sum_qp = 1._qp
call SYSTEM_CLOCK(count_rate=crate)
write(*,*) 'Testing elementary operations...'
call SYSTEM_CLOCK(count=cstart_dp)
do i=1,50000000
sum_dp = sum_dp - 1._dp
sum_dp = sum_dp + 1._dp
sum_dp = sum_dp*2._dp
sum_dp = sum_dp/2._dp
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP sum: ',sum_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i=1,50000000
sum_qp = sum_qp - 1._qp
sum_qp = sum_qp + 1._qp
sum_qp = sum_qp*2._qp
sum_qp = sum_qp/2._qp
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP sum: ',sum_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
! == TEST 2. SQUARE ROOT ==
sqrt_dp = 2._dp
sqrt_qp = 2._qp
write(*,*) 'Testing square root ...'
call SYSTEM_CLOCK(count=cstart_dp)
do i = 1,10000000
sqrt_dp = sqrt(sqrt_dp)
sqrt_dp = 2._dp
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP sqrt: ',sqrt_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i = 1,10000000
sqrt_qp = sqrt(sqrt_qp)
sqrt_qp = 2._qp
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP sqrt: ',sqrt_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
! == TEST 3. TRIGONOMETRIC FUNCTIONS ==
pi_dp = acos(-1._dp); mone_dp = 1._dp; zero_dp = 0._dp
pi_qp = acos(-1._qp); mone_qp = 1._qp; zero_qp = 0._qp
write(*,*) 'Testing trigonometric functions ...'
call SYSTEM_CLOCK(count=cstart_dp)
do i = 1,10000000
mone_dp = cos(pi_dp)
zero_dp = sin(pi_dp)
end do
call SYSTEM_CLOCK(count=cend_dp)
time_dp = real(cend_dp - cstart_dp)/real(crate)
write(*,*) 'DP cos: ',mone_dp
write(*,*) 'DP sin: ',zero_dp
write(*,*) 'DP time: ',time_dp,' seconds'
call SYSTEM_CLOCK(count=cstart_qp)
do i = 1,10000000
mone_qp = cos(pi_qp)
zero_qp = sin(pi_qp)
end do
call SYSTEM_CLOCK(count=cend_qp)
time_qp = real(cend_qp - cstart_qp)/real(crate)
write(*,*) 'QP cos: ',mone_qp
write(*,*) 'QP sin: ',zero_qp
write(*,*) 'QP time: ',time_qp,' seconds'
write(*,*)
write(*,*) 'DP is ',time_qp/time_dp,' times faster.'
write(*,*)
end program QUAD_TEST
Results of a typical run, after compiling with gfortran 4.8.4
, without any optimization flag:
Testing elementary operations...
DP sum: 1.0000000000000000
DP time: 0.572000027 seconds
QP sum: 1.00000000000000000000000000000000000
QP time: 4.32299995 seconds
DP is 7.55769205 times faster.
Testing square root ...
DP sqrt: 2.0000000000000000
DP time: 5.20000011E-02 seconds
QP sqrt: 2.00000000000000000000000000000000000
QP time: 2.60700011 seconds
DP is 50.1346169 times faster.
Testing trigonometric functions ...
DP cos: -1.0000000000000000
DP sin: 1.2246467991473532E-016
DP time: 2.79600000 seconds
QP cos: -1.00000000000000000000000000000000000
QP sin: 8.67181013012378102479704402604335225E-0035
QP time: 5.90199995 seconds
DP is 2.11087275 times faster.
Something must be going on here. My guess is that sqrt
is computed with gfortran
through an optimized algorithm, which probably hasn't been implemented for quadruple precision computations. This might not be the case for sin
and cos
, but why are elementary operations 7.6 times slower in quadruple precision while for trigonometric functions things slow down only by a factor of 2? If the algorithm used for the trigonometric functions would be the same for quad and double precision, I would expect to see a sevenfold increase in CPU time for them too.
What is the average slowdown in scientific calculations when 128-bit precision is used, compared to 64-bit?
I am running this on an Intel i7-4771 @ 3.50GHz.
Upvotes: 5
Views: 975
Reputation: 81
It's interesting to note that if you change:
sqrt_qp = sqrt(sqrt_qp)
sqrt_qp = 2._qp
to
sqrt_qp = sqrt(2._qp)
the computation will be faster !
Upvotes: 0
Reputation: 18098
More an extended comment than an answer, but...
Current CPUs provide a lot of hardware acceleration for double precision floating point arithmetic. Some even provide facilities for extended precision. Beyond that, you are limited to software implementations which are (as you noticed) considerably slower.
However, the exact factor of this slow-down is almost impossible to predict in generic cases. It depends on your CPU (e.g. which acceleration it has built-in), and on the software stack. For double-precision you typically use different math-libraries than for quadruple precision, and these might use different algorithms for basic operations.
For a particular operation/algorithm on a given hardware using the same algorithm you can probably derive a number, but this for sure will not be universally true.
Upvotes: 5