Reputation: 61
I am trying to instantiate an array in GPU memory using CuPy and then pass the pointer to this array to C++ using pybind11.
A minimal example of the problem I am running into is shown below.
import demolib #compiled pybind11 library
import cupy as cp
x = cp.ones(100000)
y = cp.ones(100000)
demolib.pyadd(len(x),x.data.ptr,y.data.ptr)
#include <iostream>
#include <math.h>
#include <cuda_runtime.h>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
// Error Checking Function
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
// Simple CUDA kernel
__global__
void cuadd(int n, float *x, float *y)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}
// Simple wrapper function to be exposed to Python
int pyadd(int N, float *x, float *y)
{
// Run kernel on 1M elements on the GPU
int blockSize = 256;
int numBlocks = (N + blockSize - 1) / blockSize;
cuadd<<<numBlocks, blockSize>>>(N,x,y);
// Wait for GPU to finish before accessing on host
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );
return 0;
}
PYBIND11_MODULE(demolib, m) {
m.doc() = "pybind11 example plugin"; // optional module docstring
m.def("pyadd", &pyadd, "A function which adds two numbers");
}
The code throws the following error:
GPUassert: an illegal memory access was encountered /home/tbm/cuda/add_pybind.cu 47
I realize that this specific example could be implemented using a cupy user defined kernel, but the end goal is to be able to do zero-copy passes of cupy arrays into a larger codebase which would be prohibitive to rewrite in this paradigm.
I have also located this GitHub Issue, which is the the reverse of what I'm trying to do.
Upvotes: 2
Views: 1536
Reputation: 61
The fix was to change the argument types of pyadd to int and cast the int to float pointers as shown below. As pointed out in the comments, this was figured out by referencing another question.(unanswered at the time of posting)
int pyadd(int N, long px, long py)
{
float *x = reinterpret_cast<float*> (px);
float *y = reinterpret_cast<float*> (py);
.
.
.
Upvotes: 4