Tyler Martin
Tyler Martin

Reputation: 61

Passing CuPy CUDA device pointer to pybind11

I am trying to instantiate an array in GPU memory using CuPy and then pass the pointer to this array to C++ using pybind11.

A minimal example of the problem I am running into is shown below.

Python

import demolib #compiled pybind11 library
import cupy as cp

x = cp.ones(100000)
y = cp.ones(100000)

demolib.pyadd(len(x),x.data.ptr,y.data.ptr)

C++/CUDA

#include <iostream>
#include <math.h>
#include <cuda_runtime.h>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

// Error Checking Function
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

// Simple CUDA kernel
__global__
void cuadd(int n, float *x, float *y)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;
  int stride = blockDim.x * gridDim.x;
  for (int i = index; i < n; i += stride)
    y[i] = x[i] + y[i];
}

// Simple wrapper function to be exposed to Python
int pyadd(int N, float *x, float *y)
{

  // Run kernel on 1M elements on the GPU
  int blockSize = 256;
  int numBlocks = (N + blockSize - 1) / blockSize;
  cuadd<<<numBlocks, blockSize>>>(N,x,y);

  // Wait for GPU to finish before accessing on host
  gpuErrchk( cudaPeekAtLastError() );
  gpuErrchk( cudaDeviceSynchronize() );

  return 0;
}

PYBIND11_MODULE(demolib, m) {
        m.doc() = "pybind11 example plugin"; // optional module docstring
        m.def("pyadd", &pyadd, "A function which adds two numbers");
}

The code throws the following error:

GPUassert: an illegal memory access was encountered /home/tbm/cuda/add_pybind.cu 47

I realize that this specific example could be implemented using a cupy user defined kernel, but the end goal is to be able to do zero-copy passes of cupy arrays into a larger codebase which would be prohibitive to rewrite in this paradigm.

I have also located this GitHub Issue, which is the the reverse of what I'm trying to do.

Upvotes: 2

Views: 1536

Answers (1)

Tyler Martin
Tyler Martin

Reputation: 61

The fix was to change the argument types of pyadd to int and cast the int to float pointers as shown below. As pointed out in the comments, this was figured out by referencing another question.(unanswered at the time of posting)

int pyadd(int N, long px, long py)
{

  float *x = reinterpret_cast<float*> (px);
  float *y = reinterpret_cast<float*> (py);

.
.
.

Upvotes: 4

Related Questions