Min Lin
Min Lin

Reputation: 3197

cuda strange memory access error to shared memory

I'm stuck on this for the whole day. The following program will give "Out of range shared or local address" error. Commenting out this line will solve this problem.

hist[tidx] = 0;

However, I don't think allocating shared memory of size 88*4 byte will be any problem.

Commenting out this line will also solve the problem

NVMatrix Acts(acts, true);

It seems if I allocate the Acts matrix in the global memory, the shared memory will behave abnormal. Any idea?

int main(int argc, char ** argv)
{
    float * act = new float[2985984];
    for (int i=0; i<2985984; i++)
        act[i] = 0.0001*(i+1);

    Matrix acts(act, 23328, 128);   // use act as the data to initialize the 23328x128, matrix in cpu

    NVMatrix Acts(acts, true);      // create a Acts Matrix which uses GPU global memory, and copies the value from CPU to GPU
                                    // If comment out this line, there is no problem to execute the program

    float cost = Calculate();

}

float Calculate()
{
    dim3 blocks(4,96);
    dim3 threads(32,8);

    cudaFuncSetCacheConfig(createShare<8, 32>, cudaFuncCachePreferShared);

    int numLabels = 88;

    createShare<8, 32><<<blocks, threads, numLabels>>>(numLabels);

    return 0;
}

template <int B_Y, int B_X>
__global__ void createShare(int numLabels)
{
    extern __shared__ float hist[];

    int tidx = threadIdx.y * B_X + threadIdx.x;
    if (tidx<numLabels) {
        printf("block %d %d %d\n", blockIdx.x, blockIdx.y, tidx);
        hist[tidx] = 0;
    }
}

Upvotes: 0

Views: 422

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

Change this:

createShare<8, 32><<<blocks, threads, numLabels>>>(numLabels);

to this:

createShare<8, 32><<<blocks, threads, numLabels*sizeof(float)>>>(numLabels);

The size of dynamic shared allocation that you are passing to the kernel is in bytes. You need to allocate enough bytes to cover 88 float quantities.

Upvotes: 6

Related Questions