Reputation: 19
I'm trying to copy a 2D matrix from host to device. I wrote this
int dev=0;
cudaSetDevice(dev);
uint16_t * dev_matrix;
size_t pitch;
cudaMallocPitch(&dev_matrix,&pitch, 1024*sizeof(uint16_t), 65536);
cudaMemcpy2D(dev_matrix, pitch, matrix, 1024*sizeof(uint16_t), 1024*sizeof(uint16_t), 65536, cudaMemcpyHostToDevice);
//kernel function to implement
cudaFree(dev_matrix);
free (matrix);
matrix
is a 2D uint16_t
vector (1024x65536). This code returns me segmentation fault, I can't understand why
Upvotes: 0
Views: 881
Reputation: 151799
This cannot be used as the source of a single cudaMemcpy
operation:
uint16_t **matrix = new uint16_t*[1024];
for(int h = 0; h < 1024; ++h) matrix[h] = new uint16_t[65536];
Each call to new
in host code creates a separate allocation, and there is no guarantee that these will be contiguous or adjacent. Therefore we cannot pass a single pointer to cudaMemcpy2D
and expect it to be able to discover where all the allocations are. cudaMemcpy2D
expects a single, contiguous allocation.
Note that cudaMemcpy2D
expects a single pointer (*
) and you are passing a double pointer (**
).
The simplest solution is to flatten your matrix
like this:
uint16_t *matrix = new uint16_t[1024*65536];
and use index arithmetic for 2D access.
Upvotes: 1