Yoshiharu Imamoto
Yoshiharu Imamoto

Reputation: 31

CUDA shared memory cannot be allocated

I'm writing CUDA kernel code which uses shared memory, but have trouble to declare shared memory variables.

It happens when I try to allocate multiple shared memory statically as follows.

__global__
void kernel_func(float *global_matrix) {
    __shared__ float sm_mat1[4][4];
    __shared__ float sm_mat2[6][6];
    __shared__ float sm_mat3[3][3][3];

    if ( blockIdx.x==0 && blockIdx.y==0 && theradIdx.x==0 && threadIdx.y==0 )
        printf("sizeof(sm_mat1)=%d, sizeof(sm_mat2)=%d, sizeof(sm_mat3)=%d.\n",
                    sizeof(sm_mat1), sizeof(sm_mat2), sizeof(sm_mat3));

    ...
}

However, when I execute, it output weird message as follows.

sizeof(sm_mat1)=64, sizeof(sm_mat2)=0, sizeof(sm_mat3)=128

It seems 2nd matrix is not allocated, and 3rd matrix is allocated as 2nd.
Actually, accessing 2nd matrix does not work correctly. (cannot read/write data).

I'm using a GTX 480, and cuda2.0. (I'm printing message using compile option -arch=sm_20).

Does anyone have any thoughts?

Upvotes: 1

Views: 219

Answers (1)

SteelRaven
SteelRaven

Reputation: 508

Operator sizeof returns not int but std::size_t. So when you send its result to printf on systems where sizeof(size_t) == 8 and sizeof(int) == 4 and try to print it with %d one result of sizeof will be split into two halves that will be printed by consecutive %d specifiers. Zero placed instead of the second specifier is the top half of the first sizeof operator result.

To correct the output value, you can explicitly cast sizeof result to int or try %ld or %lld specifiers.

But i also can't reproduce the bug with incorrect second array size.

Upvotes: 5

Related Questions