Reputation: 607
I have a HostMatrix which was declared as:
float **HostMatrix
I have to copy the content of device matrix , pointed to by devicePointer
to the 2 dimensional host matrix HostMatrix
I tried this
for (int i=0; i<numberOfRows; i++){
cudaMemcpy(HostMatrix[i], devicePointer, numberOfColumns *sizeof(float),
cudaMemcpyDeviceToHost);
devicePointer += numberOfColumns;// so as to reach next row
}
But this will be wrong since I am doing this inside a host function, and devicePointer can not be manipulated directly in host function as I am doing in last line.
So what will be the correct way to achieve this ?
Edit
Oh actually this will work correctly!. But the problem would come while de-allocating the memory as discussed in my earlier question: CUDA: Invalid Device Pointer error when reallocating memory . So basically the following will be incorrect
for (int i=0; i<numberOfRows; i++){
cudaMemcpy(HostMatrix[i], devicePointer, numberOfColumns *sizeof(float),
cudaMemcpyDeviceToHost);
devicePointer += numberOfColumns;// so as to reach next row
}
cudaFree(devicePointer); //invalid device pointer
Upvotes: 0
Views: 290
Reputation: 583
You basically need to first allocate devicePointer with all the required memory. But then, increasing it all the time is maybe not the easiest idea, since then the free at the end will be broken. Say you have nRows rows of size nCols. Then this should work properly (I didn't try though, but the idea should be ok):
float* dPtr;
cudaMalloc(&dPtr, nRows * nCols);
for (int i=0; i< nRows; i++){
cudaMemcpy(HostMatrix[i], dPtr + i * nCols, nCols * sizeof(float), cudaMemcpyDeviceToHost);
}
// do whatever you want
cudaFree(dPtr);
The issue is that if you keep increasing dPtr, the cudaFree at the end will only be on the "last row" so it's wrong.
Does it make sense?
Upvotes: 2