Cuda C++: Malloc class on Device and fill it with data from the host

Question

My goal is to 'fill' a class that resides in device memory from the host. Since that class contains a pointer to data, my understanding is that, after allocating the class itself, I need to allocate the space for it seperately and then change the pointer of the device class to the now allocated pointer. I've tried to orient my solution according to this post which, in my eyes, seems to do exactly what I want, however I am doing something wrong and thus would like help.

I have the follwing setup of classes and relevant code:

class A {
public:
    HostB host_B;
    B *dev_B;
    void moveBToGPU();
}

class HostB {
public:
    vector info;
}

class B {
public:
    int *info;
}

void A::moveBToGPU() {
    cudaMalloc(this->dev_B, sizeof(B));

    int* dev_data;
    cudaMalloc(&dev_data, sizeof(int) * host_B->info.size());

    cudaMemcpy(&this->dev_B->info, &dev_data, sizeof(int *), cudaMemcpyHostToDevice); //Not sure if correct

    //I would like to do the following, but that results in a segfault
    cudaMemcpy(this->dev_B->info, host_B->info.data(), host_B->info.size(), cudaMemcpyHostToDevice);

    //As expected, this works
    cudaMemcpy(dev_data, host_B->info.data(), host_B->info.size(), cudaMemcpyHostToDevice;

Robert Crovella · Accepted Answer

Just get rid of the line causing the seg fault. The line that comes after it does what you want, correctly. The segfault is arising due to the fact that this: this->dev_B->info requires dereferencing a device pointer in host code (illegal) whereas this: dev_data does not. Also note that you probably want to multiply host_B->info.size() by sizeof(int) as you did with cudaMalloc

Here is an example. Your posted code could not compile, it had numerous compile errors (in moveBToGPU). I'm not going to try and list every compile error. Please study the example below for the changes:

$ cat t1676.cu
#include 
#include 
using namespace std;
class HostB {
public:
    vector info;
};

class B {
public:
    int *info;
};

class A {
public:
    HostB host_B;
    B *dev_B;
    void moveBToGPU();
};

__global__ void k(A a){

  printf("%d
",a.dev_B->info[0]);
}

void A::moveBToGPU() {
    cudaMalloc(&dev_B, sizeof(B));

    int* dev_data;
    cudaMalloc(&dev_data, sizeof(int) * host_B.info.size());

    cudaMemcpy(&dev_B->info, &dev_data, sizeof(int *), cudaMemcpyHostToDevice); //Not sure if correct


    //As expected, this works
    cudaMemcpy(dev_data, host_B.info.data(), sizeof(int)*host_B.info.size(), cudaMemcpyHostToDevice);
}

int main(){

  A a;
  a.host_B.info.push_back(12);
  a.moveBToGPU();
  k<<<1,1>>>(a);
  cudaDeviceSynchronize();
}
$ nvcc -o t1676 t1676.cu
$ cuda-memcheck ./t1676
========= CUDA-MEMCHECK
12
========= ERROR SUMMARY: 0 errors
$

Cuda C++: Malloc class on Device and fill it with data from the host

Answers (1)

Related Questions