Reputation: 4779
Can anyone help me to understand why the following code causes a segmentation fault? Likewise, can anyone help me understand why swapping out the two lines labelled "bad" for the two lines labelled "good" does not result in a segmentation fault?
Note that the seg fault seems to occur at the cudaMalloc line; if I comment that out I also do not see a segmentation fault. These allocations seem to be stepping on each other, but I don't understand how.
The intent of the code is to set up three structures: h_P on the host, which will be populated by a CPU routine d_P on the device, which will be populated by a GPU routine h_P_copy on the host, which will be populated by copying the GPU data structure back in.
That way I can verify correct behavior and benchmark one vs the other.
All of those are, indeed, four-dimensional arrays.
(If it matters, the card in question is a GTX 580, using nvcc 4.2 under SUSE Linux)
#define NUM_STATES 32
#define NUM_MEMORY 16
int main( int argc, char** argv) {
// allocate and create P matrix
int P_size = sizeof(float) * NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
// float *h_P = (float*) malloc (P_size); **good**
// float *h_P_copy = (float*) malloc (P_size); **good**
float h_P[P_size]; // **bad**
float h_P_copy[P_size]; // **bad**
float *d_P;
cudaMalloc( (void**) &d_P, P_size);
cudaMemset( d_P, 0.0, P_size);
}
Upvotes: 1
Views: 2417
Reputation: 517
The two lines labeled good are allocating 262144 * sizeof(float) bytes. The two lines labeled bad are allocating 262144 * sizeof(float) * sizeof(float) bytes.
Upvotes: 1
Reputation: 152173
This is likely due to stack corruption of some sort.
Notes:
float
storage. The "bad" are allocating 4x as much float
storage.cudaMemset
, just like memset
, is setting bytes and
expects a unsigned char quantity, not a float (0.0) quantity.Since the cudaMalloc
line is the first one that actually "uses" (attempts to set) any of the allocated stack storage in the "bad" case, it is where the seg fault occurs. If you added an additional declaration like so:
float *d_P;
float myval; //add
myval = 0.0f; //add2
cudaMalloc( (void**) &d_P, P_size);
I suspect you might see the seg fault occur on the "add2" line, as it would then be the first to make use of the corrupted stack storage.
Upvotes: 3