Mysterious Seg Faults with Cudamalloc

Question

Can anyone help me to understand why the following code causes a segmentation fault? Likewise, can anyone help me understand why swapping out the two lines labelled "bad" for the two lines labelled "good" does not result in a segmentation fault?

Note that the seg fault seems to occur at the cudaMalloc line; if I comment that out I also do not see a segmentation fault. These allocations seem to be stepping on each other, but I don't understand how.

The intent of the code is to set up three structures: h_P on the host, which will be populated by a CPU routine d_P on the device, which will be populated by a GPU routine h_P_copy on the host, which will be populated by copying the GPU data structure back in.

That way I can verify correct behavior and benchmark one vs the other.
All of those are, indeed, four-dimensional arrays.

(If it matters, the card in question is a GTX 580, using nvcc 4.2 under SUSE Linux)

#define NUM_STATES              32
#define NUM_MEMORY              16

int main( int argc, char** argv) {

        // allocate and create P matrix
        int P_size      = sizeof(float) * NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
        // float *h_P      = (float*) malloc (P_size);  **good**
        // float *h_P_copy = (float*) malloc (P_size);  **good**
        float h_P[P_size];                            //  **bad**
        float h_P_copy[P_size];                       //  **bad**
        float *d_P;
        cudaMalloc( (void**) &d_P, P_size);
        cudaMemset( d_P, 0.0, P_size);

}

Robert Crovella · Accepted Answer

This is likely due to stack corruption of some sort.

Notes:

The "good" lines allocate out of the system heap, the "bad" lines allocate stack storage.
Normally the amount you can allocate from the stack is quite a bit smaller than what you can allocate from the heap.
The "good" and "bad" declarations are not reserving the same amount of float storage. The "bad" are allocating 4x as much float storage.
Finally, cudaMemset, just like memset, is setting bytes and expects a unsigned char quantity, not a float (0.0) quantity.

Since the cudaMalloc line is the first one that actually "uses" (attempts to set) any of the allocated stack storage in the "bad" case, it is where the seg fault occurs. If you added an additional declaration like so:

    float *d_P;
    float myval;  //add
    myval = 0.0f; //add2
    cudaMalloc( (void**) &d_P, P_size);

I suspect you might see the seg fault occur on the "add2" line, as it would then be the first to make use of the corrupted stack storage.

Mysterious Seg Faults with Cudamalloc

Answers (2)

Related Questions