Reputation: 490
My goal is to 'fill' a class that resides in device memory from the host. Since that class contains a pointer to data, my understanding is that, after allocating the class itself, I need to allocate the space for it seperately and then change the pointer of the device class to the now allocated pointer. I've tried to orient my solution according to this post which, in my eyes, seems to do exactly what I want, however I am doing something wrong and thus would like help.
I have the follwing setup of classes and relevant code:
class A {
public:
HostB host_B;
B *dev_B;
void moveBToGPU();
}
class HostB {
public:
vector<int> info;
}
class B {
public:
int *info;
}
void A::moveBToGPU() {
cudaMalloc(this->dev_B, sizeof(B));
int* dev_data;
cudaMalloc(&dev_data, sizeof(int) * host_B->info.size());
cudaMemcpy(&this->dev_B->info, &dev_data, sizeof(int *), cudaMemcpyHostToDevice); //Not sure if correct
//I would like to do the following, but that results in a segfault
cudaMemcpy(this->dev_B->info, host_B->info.data(), host_B->info.size(), cudaMemcpyHostToDevice);
//As expected, this works
cudaMemcpy(dev_data, host_B->info.data(), host_B->info.size(), cudaMemcpyHostToDevice;
Upvotes: 1
Views: 279
Reputation: 152164
Just get rid of the line causing the seg fault. The line that comes after it does what you want, correctly. The segfault is arising due to the fact that this: this->dev_B->info
requires dereferencing a device pointer in host code (illegal) whereas this: dev_data
does not. Also note that you probably want to multiply host_B->info.size()
by sizeof(int)
as you did with cudaMalloc
Here is an example. Your posted code could not compile, it had numerous compile errors (in moveBToGPU
). I'm not going to try and list every compile error. Please study the example below for the changes:
$ cat t1676.cu
#include <cstdio>
#include <vector>
using namespace std;
class HostB {
public:
vector<int> info;
};
class B {
public:
int *info;
};
class A {
public:
HostB host_B;
B *dev_B;
void moveBToGPU();
};
__global__ void k(A a){
printf("%d\n",a.dev_B->info[0]);
}
void A::moveBToGPU() {
cudaMalloc(&dev_B, sizeof(B));
int* dev_data;
cudaMalloc(&dev_data, sizeof(int) * host_B.info.size());
cudaMemcpy(&dev_B->info, &dev_data, sizeof(int *), cudaMemcpyHostToDevice); //Not sure if correct
//As expected, this works
cudaMemcpy(dev_data, host_B.info.data(), sizeof(int)*host_B.info.size(), cudaMemcpyHostToDevice);
}
int main(){
A a;
a.host_B.info.push_back(12);
a.moveBToGPU();
k<<<1,1>>>(a);
cudaDeviceSynchronize();
}
$ nvcc -o t1676 t1676.cu
$ cuda-memcheck ./t1676
========= CUDA-MEMCHECK
12
========= ERROR SUMMARY: 0 errors
$
Upvotes: 3