Reputation: 13
int main() {
char** hMat,* dArr;
hMat = new char*[10];
for (int i=0;i<10;i++) {
hMat[i] = new char[10];
}
cudaMalloc((void**)&dArr,100);
// Copy from dArr to hMat here:
}
I have an array, dArr
on the GPU, and I want to copy it into a 2D array hMat
on the host, where the first 10 fields in the GPU array are copied to the first row in the host matrix, and the next 10 fields are copied to the second row, and so on.
There are some functions in the documentation, namely CudaMemcpy2D
and CudaMemcpy2DFromArray
, but I'm not quite sure how they should be used.
Upvotes: 0
Views: 1089
Reputation: 151799
Your allocation scheme (an array of pointers, separately allocated) has the potential to create a discontiguous allocation on the host. There are no cudaMemcpy
operations of any type (including the ones you mention) that can target an arbitrarily discontiguous area, which your allocation scheme has the potential to create.
In a nutshell, then, your approach is troublesome. It can be made to work, but will require a loop to perform the copying -- essentially one cudaMemcpy
operation per "row" of your "2D array". If you choose to do that, presumably you don't need help. It's quite straightforward.
What I will suggest is that you instead modify your host allocation to create an underlying contiguous allocation. Such a region can be handled by a single, ordinary cudaMemcpy
call, but you can still treat it as a "2D array" in host code.
The basic idea is to create a single allocation of the correct overall size, then to create a set of pointers to specific places within the single allocation, where each "row" should start. You then reference into this pointer array using your initial double-pointer.
Something like this:
#include <stdio.h>
typedef char mytype;
int main(){
const int rows = 10;
const int cols = 10;
mytype **hMat = new mytype*[rows];
hMat[0] = new mytype[rows*cols];
for (int i = 1; i < rows; i++) hMat[i] = hMat[i-1]+cols;
//initialize "2D array"
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
hMat[i][j] = 0;
mytype *dArr;
cudaMalloc(&dArr, rows*cols*sizeof(mytype));
//copy to device
cudaMemcpy(dArr, hMat[0], rows*cols*sizeof(mytype), cudaMemcpyHostToDevice);
//kernel call
//copy from device
cudaMemcpy(hMat[0], dArr, rows*cols*sizeof(mytype), cudaMemcpyDeviceToHost);
return 0;
}
Upvotes: 1