CUDA shared memory cannot be allocated

Question

I'm writing CUDA kernel code which uses shared memory, but have trouble to declare shared memory variables.

It happens when I try to allocate multiple shared memory statically as follows.

__global__
void kernel_func(float *global_matrix) {
    __shared__ float sm_mat1[4][4];
    __shared__ float sm_mat2[6][6];
    __shared__ float sm_mat3[3][3][3];

    if ( blockIdx.x==0 && blockIdx.y==0 && theradIdx.x==0 && threadIdx.y==0 )
        printf("sizeof(sm_mat1)=%d, sizeof(sm_mat2)=%d, sizeof(sm_mat3)=%d.
",
                    sizeof(sm_mat1), sizeof(sm_mat2), sizeof(sm_mat3));

    ...
}

However, when I execute, it output weird message as follows.

sizeof(sm_mat1)=64, sizeof(sm_mat2)=0, sizeof(sm_mat3)=128

It seems 2nd matrix is not allocated, and 3rd matrix is allocated as 2nd.
Actually, accessing 2nd matrix does not work correctly. (cannot read/write data).

I'm using a GTX 480, and cuda2.0. (I'm printing message using compile option -arch=sm_20).

Does anyone have any thoughts?

CUDA shared memory cannot be allocated

Answers (1)

Related Questions