C++ class dll with CUDA member?

Question

I have a C++ class-based dll. I'd like to convert some of the class members to CUDA based operation.

I am using VS2012, WINDOWS 7, CUDA6.5, sm_20;

Say the original SuperProjector.h file is like:

class __declspec(dllexport) SuperProjector 
{
public:
    SuperProjector(){}; 
    ~SuperProjector(){};
    void sumVectors(float* c, float* a, float* b, int N);
};

and the original sumVector() function in SuperProjector.cpp

void SuperProjector::sumVectors(float* c, float* a, float* b, int N)
{
    for (int n = 1; n < N; b++)
        c[n] = a[n] + b[n];
}

I am stuck on how I should convert sumVector() to CUDA. Specifically:

I read some posts saying add __global__ __device__ keywords in front of class members will work, but so I need to change the suffix of the cpp file to cu?
I also tried to create a cuda project from the beginning, but it seems VS2012 does not give me the option of creating a dll once I chose to create a CUDA project.

I am very confused what is the best way to convert some of the members of tthis C++ class based dll into some CUDA kernel functions. I appreciate anyone can offer some ideas, or better with some very simple examples.

Nikolay K · Accepted Answer

Create CUDA project, let's call it cudaSuperProjector and add two files SuperProjector.cu and SuperProjector.h

cudaSuperProjector.h

class __declspec(dllexport) cudaSuperProjector {
public:
    cudaSuperProjector(){ }
    ~cudaSuperProjector(){ }
    void sumVectors(float* c, float* a, float* b, int N);
};

cudaSuperProjector.cu

#include 

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include "cudaSuperProjector.h"

__global__ void addKernel(float *c, const float *a, const float *b) {
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

// Helper function for using CUDA to add vectors in parallel.
cudaError_t addWithCuda(float *c, const float *a, const float *b, unsigned int size) {
    float *dev_a = 0;
    float *dev_b = 0;
    float *dev_c = 0;
    cudaError_t cudaStatus;

    // Choose which GPU to run on, change this on a multi-GPU system.
    cudaStatus = cudaSetDevice(0);

    // Allocate GPU buffers for three vectors (two input, one output)    .
    cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(float));
    cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(float));
    cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(float));

    // Copy input vectors from host memory to GPU buffers.
    cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(float), cudaMemcpyHostToDevice);
    cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(float), cudaMemcpyHostToDevice);

    // Launch a kernel on the GPU with one thread for each element.
    addKernel << <1, size >> >(dev_c, dev_a, dev_b);
    // Check for any errors launching the kernel
    cudaStatus = cudaGetLastError();
    // cudaDeviceSynchronize waits for the kernel to finish, and returns
    // any errors encountered during the launch.
    cudaStatus = cudaDeviceSynchronize();

    // Copy output vector from GPU buffer to host memory.
    cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(float), cudaMemcpyDeviceToHost);
    return cudaStatus;
}

void cudaSuperProjector::sumVectors(float* c, float* a, float* b, int N) {
    cudaError_t cudaStatus = addWithCuda(c, a, b, N);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaSuperProjector::sumVectors failed!");
    }
}

Note: In properties of file cudaSuperProjector.cu Item Type should be CUDA C/C++.

Go to properties of the project and in General set value of Configuration Type to Dynamic Library (.dll). Now everything for creating library is ready. Compile this project and in output folder you will find cudaSuperProjector.dll and cudaSuperProjector.lib. Create directory cudaSuperProjector\lib and copy cudaSuperProjector.dll and cudaSuperProjector.lib there. Also create cudaSuperProjector\include and copy cudaSuperProjector.h in it.

Create another Visual C++ project, let's call it SuperProjector. Add file SuperProjector.cpp to the project.

SuperProjector.cpp

#include 

#include "cudaSuperProjector/cudaSuperProjector.h"

int main(int argc, char** argv) {

    float a[6] = { 0, 1, 2, 3, 4, 5 };
    float b[6] = { 1, 2, 3, 4, 5, 6 };
    float c[6] = {  };

    cudaSuperProjector csp;
    csp.sumVectors(c, a, b, 6);
    printf("c = {%f, %f, %f, %f, %f, %f}
",
           c[0], c[1], c[2], c[3], c[4], c[5]);

    return 0;
}

In properties of the project add path to the dll and lib files to the VC++ Directories -> Library Directories, for example D:\cudaSuperProjector\lib;, in VC++ Directories -> Include Directories add path to the header, for example D:\cudaSuperProjector\include;. Then go to the Linker -> Input and add cudaSuperProjector.lib;.
Now your project should compile fine, but when you run it it will show you the error

The program can't start because cudaSuperProjector.dll is missing from your computer. Try reinstalling the program to fix this problem.

You need to copy cudaSuperProjector.dll to the output folder of the project, so it will be under the same folder as SuperProjector.exe. You can do it manually or add
```
copy D:\cudaSuperProjector\lib\cudaSuperProjector.dll $(SolutionDir)$(Configuration)\
```
in Build Events -> Post-Build Events -> Command Line, where $(SolutionDir)$(Configuration)\ is output path for solution (see Configuration Properties -> General -> Output Directory).

C++ class dll with CUDA member?

Answers (1)

Related Questions