Reputation: 1319
I read manual deep-copying of Fortran derived types is possible, but the following simple test program fails at run time; program compiled cleanly with PGI v16.10. What am getting wrong ?
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)
!$acc kernels
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
print*,grid%xm
end program Test
The error I am getting is:
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
Upvotes: 0
Views: 826
Reputation: 5646
You just need to add a "present(grid)" clause on the kernels directive.
Here's an example of your program with the fix as well as a few other things like updating the data so it can be printed on the host.
% cat test.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data create(grid%xm)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid%xm)
print*,grid%xm
!$acc exit data delete(grid%xm, grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
test:
16, Generating enter data copyin(grid)
17, Generating enter data create(grid%xm(:))
18, Generating present(grid)
19, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
23, Generating update self(grid%xm(:))
1.000000 4.000000 9.000000 16.00000
25.00000 36.00000 49.00000 64.00000
81.00000 100.0000
Note that PGI 17.7 will include beta support true deep copy in Fortran. As opposed to manual deep copy which you have above. Here's an example of using true deep copy:
% cat test_deep.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid)
print*,grid%xm
!$acc exit data delete(grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
test:
16, Generating enter data copyin(grid)
17, Generating present(grid)
18, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
22, Generating update self(grid)
1.000000 4.000000 9.000000 16.00000
25.00000 36.00000 49.00000 64.00000
81.00000 100.0000
Upvotes: 1