Reputation: 31
I'm writing CUDA kernel code which uses shared memory, but have trouble to declare shared memory variables.
It happens when I try to allocate multiple shared memory statically as follows.
__global__
void kernel_func(float *global_matrix) {
__shared__ float sm_mat1[4][4];
__shared__ float sm_mat2[6][6];
__shared__ float sm_mat3[3][3][3];
if ( blockIdx.x==0 && blockIdx.y==0 && theradIdx.x==0 && threadIdx.y==0 )
printf("sizeof(sm_mat1)=%d, sizeof(sm_mat2)=%d, sizeof(sm_mat3)=%d.\n",
sizeof(sm_mat1), sizeof(sm_mat2), sizeof(sm_mat3));
...
}
However, when I execute, it output weird message as follows.
sizeof(sm_mat1)=64, sizeof(sm_mat2)=0, sizeof(sm_mat3)=128
It seems 2nd matrix is not allocated, and 3rd matrix is allocated as 2nd.
Actually, accessing 2nd matrix does not work correctly. (cannot read/write data).
I'm using a GTX 480, and cuda2.0.
(I'm printing message using compile option -arch=sm_20
).
Does anyone have any thoughts?
Upvotes: 1
Views: 219
Reputation: 508
Operator sizeof
returns not int
but std::size_t
.
So when you send its result to printf on systems where sizeof(size_t) == 8
and sizeof(int) == 4
and try to print it with %d
one result of sizeof
will be split into two halves that will be printed by consecutive %d
specifiers. Zero placed instead of the second specifier is the top half of the first sizeof operator result.
To correct the output value, you can explicitly cast sizeof
result to int or try %ld
or %lld
specifiers.
But i also can't reproduce the bug with incorrect second array size.
Upvotes: 5