Reputation: 1972
How I wish I had a minimum working example for this!
I'm doing a bunch of linear algebra using the HSL libraries. I've turned on every debugging flag I can think of.
On my workstation, the final result of my "deterministic" code rarely works. Most of the time, it complains of an indexing error:
forrtl: severe (408): fort: (3): Subscript #1 of the array W has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libintlc.dylib 0000000103C83E04 Unknown Unknown Unknown
libintlc.dylib 0000000103C8259E Unknown Unknown Unknown
libifcore.dylib 00000001031FBDA1 Unknown Unknown Unknown
libifcore.dylib 000000010316BA4E Unknown Unknown Unknown
libifcore.dylib 000000010316BFB3 Unknown Unknown Unknown
forrtl: severe (408): fort: (3): Subscript #1 of the array A has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libifcore.dylib 000000010ABDCC96 Unknown Unknown Unknown
Uniform2DSimplifi 00000001068851EE _ma48bd_ 1461 ma48d.f
Uniform2DSimplifi 000000010693619C _solve_sparse_mat 142 solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010693A7D8 _scale_and_solve_ 128 scale_and_solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010685740D _calc_simplified_ 598 calc_simplified_equations_B.f90
Uniform2DSimplifi 0000000106832176 _MAIN__ 161 uniform_2D_simplified_B.f90
Uniform2DSimplifi 000000010683175E Unknown Unknown Unknown
(You may notice that these are actually two different errors, even though I haven't changed a line of code between them.)
My code runs successfully ~70% of the time using the newer version of ifort
on my laptop, but only ~20% of the time using the older version of ifort
on my workstation. Oddly, the times that it does work are often after a fresh compilation, and after working one time, it gives that error every time after that. One time it worked, didn't work the second time, then worked the third time. (Sometimes on my laptop, it works for the first 2-3 runs, but throws an error the fourth time.)
My own code is entirely deterministic: it's setting up solving linear algebra routines. It also calls the HSL routines, which evidently call MKL. I would assume that both HSL and MKL are deterministic -- that is, identical inputs produce identical outputs. (They don't call RAND() or do file I/O....) Still, I'm not sure.
I looked up line 1461 of ma48d.f:
CALL MA50BD(NR,NC,NZB,JOB5,A(K+1),IRN(K+1),KEEP(IPTRD+J1),
+ CNTL5,ICNTL5,IW(J1),IQB,NP,LA-NEWNE-KK,
+ A(NEWNE+KK+1),IRN(NEWNE+KK+1),KEEP(IPTRL+J1),
+ KEEP(IPTRU+J1),W,IW(M+1),INFO5,RINFO5)
On my laptop, it's complaining because k
has a value -1
(causing the error) while it normally has a value of 0
(leading to success). What's bizarre about this is that I'm giving these routines the exact same inputs, and the code appears to be deterministic, so they should execute the exact same lines...yet they don't.
What could be causing this erratic behavior?
So far, I've thought of the following possibilities:
Upvotes: 1
Views: 325
Reputation: 57804
In my experience, compilers are far more reliable than programmers. That is, I would suspect the program of having a programming error unless it can be proved that bad code was generated.
This kind of error is certainly due to using an uninitialized value. Look for a variable which is not specifically set to some value before being used.
program x
integer :: i, arr(10)
do while (i < 10)
arr (i) = 0
i = i + 1
enddo
print *, arr
end
Sometimes this code will set all the elements to zero. Other times it won't change a thing.
A directly related, but more subtle lack-of-initialization error occurs in this logic:
program y
integer :: sum, i, arrA(10), arrB(10)
real :: ave(2)
do i = 1, 10
arrA(i) = i * 343
arrB(i) = i * 121
enddo
sum = 0
do i = 1, 10
sum = sum + arrA(i)
enddo
ave(0) = sum / 10.0
do i = 1, 10
sum = sum + arrB(i)
enddo
ave(1) = sum / 10.0
print *, 'Averages are', ave
end
No compiler warning will show up for failing to reinitialize sum
, though this sort of error is reproducible and deterministic.
Upvotes: 3
Reputation: 72
I cannot add a comment - hence the answer.
You can also try -ftrapuv (initialize stack variables to an unusual value). If you are using Intel 15 or higher you can set -init=snan. This initializes 'save'd variables to signal NaN.
Upvotes: 1