Giulio Graziani
Giulio Graziani

Reputation: 21

Illegal memory access when using a reference to cudaMalloc'd pointer

I want to create a P object inside a class and have a P& inside a second class.

This is P:

struct P
{
    int* p_;

    P() { CC(cudaMalloc(&p_, sizeof(int))); }

    __device__ void print() { printf("p: %d\n", p_[0]); }
};

These are my classes:

struct A
{
    P a_;
};

struct B
{
    B(A& a) : b_(a.a_) {}
    P& b_;
};

This is how I use P on the device:

__global__ void Ka(A a)
{
    a.a_.p_[0] = 399;
    a.a_.print();
}

__global__ void Kb(B b)
{
    b.b_.print();
}

And this is how I call it:

A a;
Ka << <1, 1 >> > (a);
CC(cudaDeviceSynchronize());
CC(cudaGetLastError());

B b(a);
Kb << <1, 1 >> > (b);
CC(cudaDeviceSynchronize());
CC(cudaGetLastError());

I'm confused because while on the debugger it appears that they are the same object in memory, I get an illegal memory access in the second (Kb) kernel call.

How can I achieve something similar without having to copy P from A to B? (in the real scenario P can be expensive to copy everytime)

Upvotes: 2

Views: 559

Answers (1)

Anis Ladram
Anis Ladram

Reputation: 1605

Object b is copied to the device when Kb<<<1, 1>>>(b) is called, but the reference to b_ it contains is a host pointer, therefore you cannot dereference it from the device.

In order to address the issue, you can replace the reference to P with new instance of P.

struct B
{
    B(A&& a) : b_(std::move(a.a_)) {}
    P b_;
};

...

B b(std::move(a));

In this example, the default move constructor in P will copy a_.p_ to b_.p_. More information on move constructors here.

For future reference, NVIDIA provides users with a tool called compute-sanitizer (shipped as part of the CUDA toolkit) that allows you to locate and address these memory issues easily:

$ compute-sanitizer --show-backtrace=device ./test
========= COMPUTE-SANITIZER
p: 399
========= Invalid __global__ read of size 8 bytes
=========     at 0x50 in test.cu:13:P::print()
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fff5800cdc0 is out of bounds
=========     and is 257,654,049,729 bytes after the nearest allocation at 0x7fc35aa00000 of size 512 bytes
=========     Device Frame:test.cu:35:Kb(B) [0x10]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
=========
test: test.cu:46: int main(): Assertion `(cudaDeviceSynchronize()) == cudaSuccess' failed.
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 2 errors

compute-sanitizer offers several features as:

  • Out-of-bounds memory accesses detection (with option --tool=memcheck, default if not specified)
  • Uninitialized memory accesses detection (with option --tool=initcheck)
  • Data race detection on shared memory (with option --tool=racecheck)
  • Invalid synchronization primitives usage (with option --tool=synccheck)

If you want to learn more about compute-sanitizer, you can find the documentation for the tool here.

Upvotes: 3

Related Questions