Reputation: 21
Below are two simple, crude test programs that try to access out of bounds memory in cpu and gpu code. I put the gpu example separately, so one can test the cpu example with different compilers and examine their behavior.
CPU example
module sizes
integer, save :: size1
integer, save :: size2
end module sizes
module arrays
real, allocatable, save :: testArray1(:, :)
real, allocatable, save :: testArray2(:, :)
end module arrays
subroutine testMemoryAccess
use sizes
use arrays
implicit none
real :: value
value = testArray1(size1+1, size2+1)
print *, 'value', value
end subroutine testMemoryAccess
Program testMemoryAccessOutOfBounds
use sizes
use arrays
implicit none
! set sizes for the example
size1 = 5000
size2 = 2500
allocate (testArray1(size1, size2))
allocate (testArray2(size2, size1))
testArray1 = 1.d0
testArray2 = 2.d0
call testMemoryAccess
end program testMemoryAccessOutOfBounds
GPU example
module sizes
integer, save :: size1
integer, save :: size2
end module sizes
module sizesCuda
integer, device, save :: size1
integer, device, save :: size2
end module sizesCuda
module arrays
real, allocatable, save :: testArray1(:, :)
real, allocatable, save :: testArray2(:, :)
end module arrays
module arraysCuda
real, allocatable, device, save :: testArray1(:, :)
real, allocatable, device, save :: testArray2(:, :)
end module arraysCuda
module cudaKernels
use cudafor
use sizesCuda
use arraysCuda
contains
attributes(global) Subroutine testMemoryAccessCuda
implicit none
integer :: element
real :: value
element = (blockIdx%x - 1)*blockDim%x + threadIdx%x
if (element.eq.1) then
value = testArray1(size1+1, size2+1)
print *, 'value', value
end if
end Subroutine testMemoryAccessCuda
end module cudaKernels
Program testMemoryAccessOutOfBounds
use cudafor
use cudaKernels
use sizes
use sizesCuda, size1_d => size1, size2_d => size2
use arrays
use arraysCuda, testArray1_d => testArray1, testArray2_d => testArray2
implicit none
integer :: istat
! set sizes for the example
size1 = 5000
size2 = 2500
size1_d = size1
size2_d = size2
allocate (testArray1_d(size1, size2))
allocate (testArray2_d(size2, size1))
testArray1_d = 1.d0
testArray2_d = 2.d0
call testMemoryAccessCuda<<<64, 64>>>
istat = cudadevicesynchronize()
end program testMemoryAccessOutOfBounds
When using nvfortran and trying to debug the program, the compiler does not give any warnings for the out of bounds access. Taking a look at the available flags for out of bounds access, both -C and -Mbounds options seem to be doing just that. However, they do not seem to work as intended.
When using ifort for the same thing, the compiler stops and prints the exact line that the out of bounds access was encountered.
How can I accomplish this using nvfortran? I though it was a CUDA specific problem, however as I was creating the examples to create this question here, I found out that nvfortran does the same thing on CPU code. Thus, it is not CUDA specific.
nvfortran
nvfortran 23.5-0 64-bit target on x86-64 Linux -tp zen2
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
ifort
ifort (IFORT) 2021.10.0 20230609
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
nvfortran
I compile the examples as follows:
nvfortran -C -traceback -Mlarge_arrays -Mdclchk -cuda -gpu=cc86 testOutOfBounds.f90
nvfortran -C -traceback -Mlarge_arrays -Mdclchk -cuda -gpu=cc86 testOutOfBoundsCuda.f90
When running the cpu code, I get a non-initialized array value:
value 1.5242136E-27
When running the gpu code, I get a zero value:
value 0.000000
ifort
I compile the cpu example as follows:
ifort -init=snan -C -fpe0 -g -traceback testOutOfBounds.f90
and I get:
forrtl: severe (408): fort: (2): Subscript #2 of the array TESTARRAY1 has value 2501 which is greater than the upper bound of 2500
Image PC Routine Line Source
a.out 00000000004043D4 testmemoryaccess_ 23 testOutOfBounds.f90
a.out 0000000000404FD6 MAIN__ 43 testOutOfBounds.f90
a.out 000000000040418D Unknown Unknown Unknown
libc.so.6 00007F65A9229D90 Unknown Unknown Unknown
libc.so.6 00007F65A9229E40 __libc_start_main Unknown Unknown
a.out 00000000004040A5 Unknown Unknown Unknown
which is actually what I expect the compiler to print.
Upvotes: 1
Views: 249
Reputation: 5646
Bounds checking isn't support by nvfortran in device code and, as the following warning indicates, is disabled when using GPU related flags:
% nvfortran -C -g -traceback -Mlarge_arrays -Mdclchk -cuda -gpu=cc86 test_bounds1.f90
nvfortran-Warning-CUDA Fortran or OpenACC GPU targets disables -Mbounds
The out-of-bounds error is found for CPU targets:
% nvfortran -C -g -traceback -Mlarge_arrays -Mdclchk test_bounds1.f90; a.out
0: Subscript out of range for array testarray1 (test_bounds1.f90: 23)
subscript=5001, lower bound=1, upper bound=5000, dimension=1
Upvotes: 2