John W.
John W.

Reputation: 153

Segmentation Fault with 3D array

I am trying to work with 3D arrays in CUDA (200x200x100).

The moment I change my z dimension (model_num) from 4 to 5, I get a segmentation fault. Why, and how can I fix it?

const int nrcells = 200;
const int nphicells = 200;
const int model_num = 5; //So far, 4 is the maximum model_num that works. At 5 and after, there is a segmentation fault

    __global__ void kernel(float* mgridb) 
{
    const unsigned long long int  i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

    if(tx >= 0 && tx < nphicells && ty >=0 && ty < nrcells && tz >= 0  && tz < model_num){
        //Do stuff with mgridb[i]
    }
}

int main (void)
{

    unsigned long long int size_matrices = nphicells*nrcells*model_num; 
    unsigned long long int mem_size_matrices = sizeof(float) * size_matrices;

    float *h_mgridb = (float *)malloc(mem_size_matrices);
    float mgridb[nphicells][nrcells][model_num];

    for(int k = 0; k < model_num; k++){
        for(int j = 0; j < nrcells; j++){
            for(int i = 0; i < nphicells; i++){
                mgridb[i][j][k] = 0;
            }
        }
    }
    float *d_mgridb;

    cudaMalloc( (void**)&d_mgridb, mem_size_matrices );
    cudaMemcpy(d_mgridb, h_mgridb, mem_size_matrices, cudaMemcpyHostToDevice);

    int threads = nphicells;
    uint3 blocks = make_uint3(nrcells,model_num,1);
    kernel<<<blocks,threads>>>(d_mgridb);
    cudaMemcpy( h_mgridb, d_mgridb, mem_size_matrices, cudaMemcpyDeviceToHost);
    cudaFree(d_mgridb);
    return 0;
}

Upvotes: 0

Views: 213

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151889

This is getting stored on the stack:

float mgridb[nphicells][nrcells][model_num];

Your stack space is limited. When you exceed the amount you can store on the stack, you are getting a seg fault, either at the point of allocation, or as soon as you try and access it.

Use malloc instead. That allocates heap storage, which has much higher limits.

None of the above has anything to do with CUDA. Furthermore its not unique or specific to "3D" arrays. Any large stack based allocation (e.g. 1D array) is going to have the same trouble.

You may also have to adjust how you access the array, but it's not difficult to handle a flattened array using pointer indexing.

Your code is actually strange looking, because you are creating an appropriately sized array h_mgridb using malloc and then copying that array to the device (into d_mgridb). It's not clear what purpose mgridb serves in your code. h_mgridb and mgridb are not the same.

Upvotes: 3

Related Questions