Reputation: 6918
Recently, I read a post on Stack Overflow about finding integers that are perfect squares. As I wanted to play with this, I wrote the following small program:
PROGRAM PERFECT_SQUARE
IMPLICIT NONE
INTEGER*8 :: N, M, NTOT
LOGICAL :: IS_SQUARE
N=Z'D0B03602181'
WRITE(*,*) IS_SQUARE(N)
NTOT=0
DO N=1,1000000000
IF (IS_SQUARE(N)) THEN
NTOT=NTOT+1
END IF
END DO
WRITE(*,*) NTOT ! should find 31622 squares
END PROGRAM
LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER*8 :: N, M
! check if negative
IF (N.LT.0) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
! check if ending 4 bits belong to (0,1,4,9)
M=IAND(N,15)
IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
! try to find the nearest integer to sqrt(n)
M=DINT(SQRT(DBLE(N)))
IF (M**2.NE.N) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
IS_SQUARE=.TRUE.
RETURN
END FUNCTION
When compiling with gfortran -O2
, running time is 4.437 seconds, with -O3 it is 2.657 seconds. Then I thought that compiling with ifort -O2
could be faster since it might have a faster SQRT
function, but it turned out running time was now 9.026 seconds, and with ifort -O3
the same. I tried to analyze it using Valgrind, and the Intel compiled program indeed uses many more instructions.
My question is why? Is there a way to find out where exactly the difference comes from?
EDITS:
time ./a.out
and is the real/user time (sys was always almost 0)NTOT=0
before loop, should fix issue with other gfortran versionsWhen the complex IF
statement is removed, gfortran takes about 4 times as much time (10-11 seconds). This is to be expected since the statement approximately throws out about 75% of the numbers, avoiding to do the SQRT
on them. On the other hand, ifort only uses slightly more time. My guess is that something goes wrong when ifort tries to optimize the IF
statement.
EDIT2:
I tried with ifort version 12.1.2.273 it's much faster, so looks like they fixed that.
Upvotes: 4
Views: 5142
Reputation: 1188
What compiler versions are you using? Interestingly, it looks like a case where there is a performance regression from 11.1 to 12.0 -- e.g. for me, 11.1 (ifort -fast square.f90) takes 3.96s, and 12.0 (same options) took 13.3s. gfortran (4.6.1) (-O3) is still faster (3.35s). I have seen this kind of a regression before, although not quite as dramatic. BTW, replacing the if statement with
is_square = any(m == [0, 1, 4, 9])
if(.not. is_square) return
makes it run twice as fast with ifort 12.0, but slower in gfortran and ifort 11.1.
It looks like part of the problem is that 12.0 is overly aggressive in trying to vectorize things: adding
!DEC$ NOVECTOR
right before the DO loop (without changing anything else in the code) cuts the run time down to 4.0 sec.
Also, as a side benefit: if you have a multi-core CPU, try adding -parallel to the ifort command line :)
Upvotes: 5