cudamalloc of 2D array

Question

I'm trying to copy a 2D matrix from host to device. I wrote this

    int dev=0;
    cudaSetDevice(dev);

    uint16_t * dev_matrix;
    size_t pitch;
    cudaMallocPitch(&dev_matrix,&pitch, 1024*sizeof(uint16_t), 65536);
    cudaMemcpy2D(dev_matrix, pitch, matrix, 1024*sizeof(uint16_t),  1024*sizeof(uint16_t), 65536, cudaMemcpyHostToDevice);
    //kernel function to implement
    cudaFree(dev_matrix);
    free (matrix);

matrix is a 2D uint16_t vector (1024x65536). This code returns me segmentation fault, I can't understand why

Robert Crovella · Accepted Answer

This cannot be used as the source of a single cudaMemcpy operation:

uint16_t **matrix = new uint16_t*[1024]; 
for(int h = 0; h < 1024; ++h) matrix[h] = new uint16_t[65536];

Each call to new in host code creates a separate allocation, and there is no guarantee that these will be contiguous or adjacent. Therefore we cannot pass a single pointer to cudaMemcpy2D and expect it to be able to discover where all the allocations are. cudaMemcpy2D expects a single, contiguous allocation.

Note that cudaMemcpy2D expects a single pointer (*) and you are passing a double pointer (**).

The simplest solution is to flatten your matrix like this:

uint16_t *matrix = new uint16_t[1024*65536];

and use index arithmetic for 2D access.

cudamalloc of 2D array

Answers (1)

Related Questions