How to allocate pitched 2D memory in CuPy?

Question

In CuPy it is possible to allocate a multi-dimensional ndarray on the host, and then copy it to the GPU using CUDA. My questions are:

does CuPy allocated memory have nice coalescing memory access properties for matrices (2D arrays) on the device, and if so, how is that related to cupy.ndarray.strides. If not, why not?
why is CuPy is not using cudaMalloc2D, cudaMalloc3D, cudaMallocPitch?

My goal is to copy a 2D array with width and height to global memory (not texture memory - which is supported). In C++ I could do that with something like:

    float* devPtr = nullptr;
    size_t devPitch;
    cudaMallocPitch((void **) &devPtr, &devPitch, sizeof(float) * width, height);
    cudaMemcpy2D(devPtr, devPitch, my_array.data(),
                 width * sizeof(float), width * sizeof(float), height,
                 cudaMemcpyHostToDevice);

But I cannot find a way in CuPy that seems to guarantee pitched properties, that I require in my custom kernel. I tried to "use the source, Luke" to find out what was really happening, but couldn't find an invocation to CUDA code that would achieve such result.

emcastillo · Accepted Answer

Pitched allocation is too specific for some domains and CuPy supports a range of use cases where matrices are being reshaped and views are created with different strides. Also, for some applications, the data is required to be contiguous and by using pitched allocations, Cuda automatically introduces padding between dimensions.

You can emulate this behavior yourself by allocating matrices with (height, pitch) and take the view with the shape (height, width). The values for the pitch should be adjusted to match the alignment with the desired data type.

How to allocate pitched 2D memory in CuPy?

Answers (1)

Related Questions