venkywonka
venkywonka

Reputation: 145

CUDA constant memory usage across multiple source files showing different behaviors on cuda-11.2 and cuda-11.4

Minimum repro:

kernel.cu:

#include <stdio.h>

__constant__ int N_GPU;

void wrapper_fn(int *ptr)
{
  cudaMemcpyToSymbol(N_GPU, ptr, sizeof(int), cudaMemcpyDeviceToDevice);
}

__global__ void printKernel() {

    printf("N = %d; \n", N_GPU);

}

driver.cu:

#include "cuda_runtime.h"
#include <stdio.h>

void wrapper_fn(int*);
__global__ void printKernel();

int main()
{
    int N = 10;
    int* d_N_ptr;
    cudaMalloc(&d_N_ptr, sizeof(int));
    cudaMemcpy(d_N_ptr, &N, sizeof(int), cudaMemcpyDefault);

    wrapper_fn(d_N_ptr);

    printKernel <<<1, 1 >>>();
    cudaPeekAtLastError();
    cudaDeviceSynchronize();

    return 0;
}

Both on cuda-11.4 and cuda-11.2, running nvcc kernel.cu driver.cu compiles. The expected output (i.e N = 10;) is only seen in 11.2 and not 11.4. Upon running cuda-gdb on 11.4, I get the following:

...
[New Thread 0x7fffee240700 (LWP 54339)]
warning: Cuda API error detected: cudaMalloc returned (0xde)

warning: Cuda API error detected: cudaMemcpy returned (0xde)

warning: Cuda API error detected: cudaMemcpyToSymbol returned (0xde)

warning: Cuda API error detected: cudaLaunchKernel returned (0xde)

warning: Cuda API error detected: cudaPeekAtLastError returned (0xde)

warning: Cuda API error detected: cudaDeviceSynchronize returned (0xde)

[Thread 0x7fffee240700 (LWP 54339) exited]
...

Any particular nvcc flags I'm missing that's important in the 11.4? or particular API changes I'm missing? Thanks in advance!

Upvotes: 0

Views: 239

Answers (1)

venkywonka
venkywonka

Reputation: 145

So the answer has to do with my driver version. The error code as seen from the cuda-gdb output (0xde = 222) is due to the fact that the compiled PTX is too new for the driver installed (my driver was 460.35), and the "CUDA Enhanced Compatibility" was used to run on my older driver, that didn't support the necessary PTX JIT.

TLDR; compiling to the exact architecture-specific SASS solved for cuda 11.4. I did this by adding the the -arch compute_70 flag to my nvcc compilation command.

Upvotes: 3

Related Questions