Reputation: 1176
In CuPy it is possible to allocate a multi-dimensional ndarray
on the host, and then copy it to the GPU using CUDA. My questions are:
cupy.ndarray.strides
. If not, why not?cudaMalloc2D
, cudaMalloc3D
, cudaMallocPitch
?My goal is to copy a 2D array with width
and height
to global memory (not texture memory - which is supported). In C++ I could do that with something like:
float* devPtr = nullptr;
size_t devPitch;
cudaMallocPitch((void **) &devPtr, &devPitch, sizeof(float) * width, height);
cudaMemcpy2D(devPtr, devPitch, my_array.data(),
width * sizeof(float), width * sizeof(float), height,
cudaMemcpyHostToDevice);
But I cannot find a way in CuPy that seems to guarantee pitched properties, that I require in my custom kernel. I tried to "use the source, Luke" to find out what was really happening, but couldn't find an invocation to CUDA code that would achieve such result.
Upvotes: 1
Views: 337
Reputation: 316
Pitched allocation is too specific for some domains and CuPy supports a range of use cases where matrices are being reshaped and views are created with different strides. Also, for some applications, the data is required to be contiguous and by using pitched allocations, Cuda automatically introduces padding between dimensions.
You can emulate this behavior yourself by allocating matrices with (height, pitch) and take the view with the shape (height, width). The values for the pitch should be adjusted to match the alignment with the desired data type.
Upvotes: 2