Cuda, unified memory, data transfers

Question

Does cuda somehow block and transfer all allocated managed memory to the GPU when a kernel is launched? I just played with uma and got strange results. At least in my point of view.

I created 2 arrays and send A to kernel, B is untouched by kernel call but it cannot be accessed. The program just crashes when I touch B.

0 0 0 here1

If I comment out the b[0] = 1; line the code runs fine:

0 0 0 here1 after1 0 here2 1 after2

Why is this happening ?

__global__ void kernel(int* t)
{
    t[0]++;

}


int main()
{
    int* a;
    int* b;


    std::cout << cudaMallocManaged(&a,sizeof(int)*100) << std::endl;
    std::cout << cudaMallocManaged(&b,sizeof(int)*100) << std::endl;

    std::cout << b[0] << std::endl;

    kernel<<<1,1,0,0>>>(a);

    std::cout << "here1" << std::endl;
    b[0] = 1;
    std::cout << "after1" << std::endl;

    cudaDeviceSynchronize();


    std::cout << b[0] << std::endl;

    std::cout << "here2" << std::endl;
    std::cout << a[0] << std::endl;
    std::cout << "after2" << std::endl;




    return 0;
}

tera · Accepted Answer

Does cuda somehow block and transfer all allocated managed memory to the GPU when a kernel is launched?

Yes, provided your device is of compute capability less than 6.0.

On these devices managed memory works by copying all managed memory to the GPU before kernel launch, and copying all managed memory back to the host on synchronization. During that timespan, accessing managed memory from the host will lead to a segmentation fault.

You can be more specific about which memory to copy for a given kernel by attaching it to a stream using cudaStreamAttachMemAsync() and launching the kernel into that stream.

Cuda, unified memory, data transfers

Answers (1)

Related Questions