Reputation: 6861
I'm having trouble trying to declare a shared memory array within the kernel. Here's the code containing my kernel:
module my_kernels
use cudafor
implicit none
contains
attributes(global) subroutine mykernel(N)
! Declare variables
integer :: index
integer, intent(in), value :: N
real,shared,dimension(N) :: shared_array
! Map threadID to index
index = blockDim%x * (blockIdx%x-1) + threadIdx%x
! Set array element equal to index
shared_array(index) = index
end subroutine mykernel
end module my_kernels
And here's how I call my kernel:
program cuda
use my_kernels
implicit none
! Set number of threads
integer :: N = 9
! Invoke kernel with 3 blocks of 3 threads
call mykernel<<<N/3,3>>>(N)
end program cuda
All of this I have in one file, test.cuf. When I try to compile test.cuf with pgf90, I get this error:
PGF90-S-0000-Internal compiler error. unexpected runtime function call 0 (test.cuf: 34)
PGF90-S-0000-Internal compiler error. unsupported procedure 349 (test.cuf: 34)
0 inform, 0 warnings, 2 severes, 0 fatal for mykernel
/tmp/pgcudaforw5MgcaFALD9p.gpu(19): error: a value of type "int" cannot be assigned to an entity of type "float *"
/tmp/pgcudaforw5MgcaFALD9p.gpu(22): error: expected an expression
2 errors detected in the compilation of "/tmp/pgnvdl7MgHLY1VOV5.nv0".
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (test.cuf: 34)
PGF90/x86-64 Linux 10.8-0: compilation aborted
In this case line 34 refers to end subroutine mykernel
. The compiler error is not very helpful, it took me a while to find out that the problem was to do with the shared array (I'm using this code as a simple example).
When I replace 'N' with '9' in the declaration of the shared array such that real,shared,dimension(N) :: shared_array
is replaced with real,shared,dimension(9) :: shared_array
, the error goes away.
My question is, why is this error occurring, and how do I set the dimension of a shared array with a variable (if indeed its possible)?
Upvotes: 0
Views: 2008
Reputation: 11
You can have more than one shared memory array, but their size must be known at compile time. In general shared memory arrays should be of fixed size, the case where you can pass the size in bytes at runtime is kind of exceptional. I guess this is all due to the limitation on shared memory in the SM (Stream Multiprocessor). In my experience developing in both CUDA C and CUDA fortran is better to have all these parameters "fixed" and then make the kernel repeat the work as many times as needed to cover all input data, that way i easier to control all the paarmeters that affect the occupancy (how well you use all the physical resources in the GPU).
Upvotes: 1
Reputation: 26
Change "dimension(N)" to "dimension(*)" and then pass in the size of shared array (in bytes) as the third argument of your kernel launch.
Hope this helps,
Mat
% cat test.cuf
module my_kernels
use cudafor
implicit none
real, dimension(:), allocatable,device :: Ad
real, dimension(:),allocatable :: Ah
contains
attributes(global) subroutine mykernel(N)
! Declare variables
integer :: index
integer, intent(IN), value :: N
real,shared,dimension(*) :: shared_array
! Map threadID to index
index = blockDim%x * (blockIdx%x-1) + threadIdx%x
! Set array element equal to index
shared_array(index) = index
Ad(index) = index
end subroutine mykernel
end module my_kernels
program cuda
use my_kernels
implicit none
! Set number of threads
integer :: N = 9
allocate(Ad(N), Ah(N))
! Invoke kernel with 3 blocks of 3 threads
call mykernel<<<N/3,3,N*4>>>(N)
Ah=Ad
print *, Ah
end program cuda
% pgf90 test.cuf -V10.9 ; a.out
1.000000 2.000000 3.000000 4.000000
5.000000 6.000000 7.000000 8.000000
9.000000
Upvotes: 1