danny
danny

Reputation: 1319

fortran openacc derived types with allocatable

I read manual deep-copying of Fortran derived types is possible, but the following simple test program fails at run time; program compiled cleanly with PGI v16.10. What am getting wrong ?

program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)

!$acc kernels
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels

   print*,grid%xm

end program Test

The error I am getting is:

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

Upvotes: 0

Views: 826

Answers (1)

Mat Colgrove
Mat Colgrove

Reputation: 5646

You just need to add a "present(grid)" clause on the kernels directive.

Here's an example of your program with the fix as well as a few other things like updating the data so it can be printed on the host.

% cat test.f90
program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc enter data create(grid%xm)
!$acc kernels present(grid)
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels
!$acc update host(grid%xm)
   print*,grid%xm

!$acc exit data delete(grid%xm, grid)
   deallocate(grid%xm)

end program Test

% pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
test:
     16, Generating enter data copyin(grid)
     17, Generating enter data create(grid%xm(:))
     18, Generating present(grid)
     19, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     23, Generating update self(grid%xm(:))
    1.000000        4.000000        9.000000        16.00000
    25.00000        36.00000        49.00000        64.00000
    81.00000        100.0000

Note that PGI 17.7 will include beta support true deep copy in Fortran. As opposed to manual deep copy which you have above. Here's an example of using true deep copy:

% cat test_deep.f90
program Test

    implicit none

    type dt
        integer :: n
        real, dimension(:), allocatable :: xm
    end type dt

    type(dt) :: grid
    integer :: i

    grid%n = 10
    allocate(grid%xm(grid%n))

!$acc enter data copyin(grid)
!$acc kernels present(grid)
   do i = 1, grid%n
      grid%xm(i) = i * i
   enddo
!$acc end kernels
!$acc update host(grid)
   print*,grid%xm

!$acc exit data delete(grid)
   deallocate(grid%xm)

end program Test

% pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
test:
     16, Generating enter data copyin(grid)
     17, Generating present(grid)
     18, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     22, Generating update self(grid)
    1.000000        4.000000        9.000000        16.00000
    25.00000        36.00000        49.00000        64.00000
    81.00000        100.0000

Upvotes: 1

Related Questions