Reputation: 21
I want to create a P
object inside a class and have a P&
inside a second class.
This is P
:
struct P
{
int* p_;
P() { CC(cudaMalloc(&p_, sizeof(int))); }
__device__ void print() { printf("p: %d\n", p_[0]); }
};
These are my classes:
struct A
{
P a_;
};
struct B
{
B(A& a) : b_(a.a_) {}
P& b_;
};
This is how I use P
on the device:
__global__ void Ka(A a)
{
a.a_.p_[0] = 399;
a.a_.print();
}
__global__ void Kb(B b)
{
b.b_.print();
}
And this is how I call it:
A a;
Ka << <1, 1 >> > (a);
CC(cudaDeviceSynchronize());
CC(cudaGetLastError());
B b(a);
Kb << <1, 1 >> > (b);
CC(cudaDeviceSynchronize());
CC(cudaGetLastError());
I'm confused because while on the debugger it appears that they are the same object in memory, I get an illegal memory access in the second (Kb
) kernel call.
How can I achieve something similar without having to copy P
from A
to B
? (in the real scenario P
can be expensive to copy everytime)
Upvotes: 2
Views: 559
Reputation: 1605
Object b
is copied to the device when Kb<<<1, 1>>>(b)
is called, but the reference to b_
it contains is a host pointer, therefore you cannot dereference it from the device.
In order to address the issue, you can replace the reference to P
with new instance of P
.
struct B
{
B(A&& a) : b_(std::move(a.a_)) {}
P b_;
};
...
B b(std::move(a));
In this example, the default move constructor in P will copy a_.p_
to b_.p_
. More information on move constructors here.
For future reference, NVIDIA provides users with a tool called compute-sanitizer (shipped as part of the CUDA toolkit) that allows you to locate and address these memory issues easily:
$ compute-sanitizer --show-backtrace=device ./test
========= COMPUTE-SANITIZER
p: 399
========= Invalid __global__ read of size 8 bytes
========= at 0x50 in test.cu:13:P::print()
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x7fff5800cdc0 is out of bounds
========= and is 257,654,049,729 bytes after the nearest allocation at 0x7fc35aa00000 of size 512 bytes
========= Device Frame:test.cu:35:Kb(B) [0x10]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
=========
test: test.cu:46: int main(): Assertion `(cudaDeviceSynchronize()) == cudaSuccess' failed.
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 2 errors
compute-sanitizer offers several features as:
--tool=memcheck
, default if not specified)--tool=initcheck
)--tool=racecheck
)--tool=synccheck
)If you want to learn more about compute-sanitizer, you can find the documentation for the tool here.
Upvotes: 3