Reputation: 491
I am having trouble trying to figure out how to retrieve a 3D array from the GPU. I want to allocate the memory for the 3d array in the host code, call the kernel, where the array will be populated, Then retrieve the 3D array in the host code to a return variable in the mexFunction (host code).
I have made several attempts at it, here is my latest code. The results are all '0's, where they should be '7'. Can anyone tell me where i'm going wrong? It might have something to do with the 3D parameters, i dont think i fully understand that part.
/* Device code */
__global__ void simulate3DArrays(cudaPitchedPtr devPitchedPtr,
int width,
int height,
int depth)
int threadId;
threadId = (blockIdx.x * blockDim.x) + threadIdx.x;
size_t pitch = devPitchedPtr.pitch;
for (int widthIndex = 0; widthIndex < width; widthIndex++) {
for (int heightIndex = 0; heightIndex < height; heightIndex++) {
*((double*)(((char*)devPitchedPtr.ptr + threadId * pitch * height) + heightIndex * pitch) + widthIndex) = 7.0;
/* Host code */
#include <stdio.h>
#include "mex.h"
/* Kernel function */
#include "simulate3DArrays.cpp"
/* Define some constants. */
#define width 5
#define height 9
#define depth 6
void displayMemoryAvailability(mxArray **MatlabMemory);
void mexFunction(int nlhs,
mxArray *plhs[],
int nrhs,
mxArray *prhs[])
double *output;
mwSize ndim3 = 3;
mwSize dims3[] = {height, width, depth};
plhs[0] = mxCreateNumericArray(ndim3, dims3, mxDOUBLE_CLASS, mxREAL);
output = mxGetPr(plhs[0]);
cudaExtent extent = make_cudaExtent(width * sizeof(double), height, depth);
cudaPitchedPtr devicePointer;
cudaMalloc3D(&devicePointer, extent);
simulate3DArrays<<<1,depth>>>(devicePointer, width, height, depth);
cudaMemcpy3DParms deviceOuput = { 0 };
deviceOuput.srcPtr.ptr = devicePointer.ptr;
deviceOuput.srcPtr.pitch = devicePointer.pitch;
deviceOuput.srcPtr.xsize = width;
deviceOuput.srcPtr.ysize = height;
deviceOuput.dstPtr.ptr = output;
deviceOuput.dstPtr.pitch = devicePointer.pitch;
deviceOuput.dstPtr.xsize = width;
deviceOuput.dstPtr.ysize = height;
deviceOuput.kind = cudaMemcpyDeviceToHost;
/* copy 3d array back to 'ouput' */
} /* End Mexfunction */
Upvotes: 0
Views: 663
Reputation: 72351
The basic problem appears to be that you are instructing the cudaMemcpy3D
to copy zero bytes, because you have not included a non-zero extent which defines the size of the transfer to the API.
Your transfer could probably be as simple as:
cudaMemcpy3DParms deviceOuput = { 0 };
deviceOuput.srcPtr = devicePointer;
deviceOuput.dstPtr.ptr = output;
deviceOuput.extent = extent;
I can't comment on whether the MEX interface you are using is correct, but the kernel looks superficially correct and I don't see anything else obviously wrong, without going to a compiler and trying to run your code with Matlab, which I cannot.
Upvotes: 1