Struct of multidimensional arrays in CUDA

Question

I am facing some issues in kernels using a big struct including 1D, 2D and 3D arrays such as the following example:

typedef struct {
    float  data[N];
    int    map_a[N][M];
    int    map_b[N][M];
    int    map_c[B][N][M];
    ...
} my_struct;

When I use a struct like that, a kernel that do: my_struct->map_dbx][y] = value hangs the program. Of course, x and y are lower than array size. However, other kernel using the same struct writing in map_a, works as expected. My suspicion is that wrong memory accesses are occurring in multidimensional arrays. So it would be nice is someone can help me with some doubts about multidimensional arrays allocation within a struct in CUDA:

As suggested in the CUDA Programming guide, function cudaMallocPitch() should be used to allocate arrays while guaranteeing padding needed when the array size does not match the warp size. Would it be correctly allocated this struct with cudaMalloc(&my_struct, sizeof(my_struct));? i.e. is cudaMallocPitch() used internally to ensure arrays paddding?
As I suspect that the answer of the first question is 'NO', How could I correctly allocate a struct of multidimensional arrays in the device while using cudaMallocPitch()? and how could I properly make a cudaMemcpy() between host and device of that struct?
Some answers in stackoverflow about issues with 2D arrays suggest to use arrays flattening. I have tried it and I have no problems using them. Is that the only solution to properly use a struct of multidimensional arrays in CUDA?

Another solution would be to use separate arrays instead a struct but I have a big amount of arrays that would have to be passed to kernels. I hope someone have some advice to solve this issue.

talonmies · Accepted Answer

As suggested in the CUDA Programming guide, function cudaMallocPitch() should be used to allocate arrays while guaranteeing padding needed when the array size does not match the warp size.

That is not what the padding is for. It is primarily for hardware restrictions in things like the texturing unit cache line size. It is unnecessary for general allocations and your use case does not require it. In statically defined structures, the compiler will ensure safe alignment of the structure, and will emit warnings or errors if it can detect you are trying to do something which would violate static alignment requirements of the hardware.

Would it be correctly allocated this struct with cudaMalloc(&my_struct, sizeof(my_struct));?

Yes

i.e. is cudaMallocPitch() used internally to ensure arrays paddding?

No. The two things have no relation to one another. The APIs know nothing about the structures you are allocating, just their size. All internal memory layout and alignment is decided by the compiler.

As I suspect that the answer of the first question is 'NO', How could I correctly allocate a struct of multidimensional arrays in the device while using cudaMallocPitch()? and how could I properly make a cudaMemcpy() between host and device of that struct?

You can't. Not under any circumstances. The layout of your structure is statically defined by the compiler. There is nothing the pitched APIs could possibly help you with.

Some answers in stackoverflow about issues with 2D arrays suggest to use arrays flattening. I have tried it and I have no problems using them. Is that the only solution to properly use a struct of multidimensional arrays in CUDA?

Again all of that is totally irrelevant to whatever your actual problem is. Array flattening is related to use cases like multidimensional arrays defined using arrays of pointers and nested vectors and the like. It will have no relationship to anything in the code you have shown.

Struct of multidimensional arrays in CUDA

Answers (1)

Related Questions