CUDA cudaMemcpyFromSymbol "invalid device symbol" error?

Question

So I see a parent question about how to copy from host to the constant memory on GPU using cudaMemcpyToSymbol.

My question is how to do the reverse, copying from device constant memory to the host using cudaMemcpyFromSymbol.

In the following minimal reproducible example, I either got

1) invalid device symbol error using cudaMemcpyFromSymbol(const_d_a, b, size);, or
2) got segmentation fault if I use cudaMemcpyFromSymbol(&b, const_d_a, size, cudaMemcpyDeviceToHost).

I have consulted with the manual which suggests I code as in 1), and this SO question that suggests I code as in 2). Neither of them work here.

Could anyone kindly help suggesting a workaround with this? I must be understanding something improperly... Thanks!

Here is the code:

// a basic CUDA function to test working with device constant memory
#include 
#include 

const unsigned int N = 10;    // size of vectors

__constant__ float const_d_a[N * sizeof(float)];  

int main()
{
    float * a, * b;  // a and b are vectors. c is the result
    a = (float *)calloc(N, sizeof(float));
    b = (float *)calloc(N, sizeof(float));

    /**************************** Exp 1: sequential ***************************/
    int i;
    int size = N * sizeof(float);
    for (i = 0; i < N; i++){
        a[i] = (float)i / 0.23 + 1;
    }


    // 1. copy a to constant memory
    cudaError_t err = cudaMemcpyToSymbol(const_d_a, a, size);
    if (err != cudaSuccess){
        printf("%s in %s at line %d
", cudaGetErrorString(err), __FILE__, __LINE__);
        exit(EXIT_FAILURE);
    }

    cudaError_t err2 = cudaMemcpyFromSymbol(const_d_a, b, size);
    if (err2 != cudaSuccess){
        printf("%s in %s at line %d
", cudaGetErrorString(err2), __FILE__, __LINE__);
        exit(EXIT_FAILURE);
    }

    double checksum0, checksum1;
    for (i = 0; i < N; i++){
        checksum0 += a[i];
        checksum1 += b[i];
    }

    printf("Checksum for elements in host memory is %f
.", checksum0);
    printf("Checksum for elements in constant memory is %f
.", checksum1);

    return 0;
}

Robert Crovella · Accepted Answer

In CUDA, the various cudaMemcpy* operations are modeled after the C standard library memcpy routine. In that function, the first pointer is always the destination pointer and the second pointer is always the source pointer. That is true for all cudaMemcpy* functions as well.

Therefore, if you want to do cudaMemcpyToSymbol, the symbol had better be the first (destination) argument passed to the function (the second argument would be a host pointer). If you want to do cudaMemcpyFromSymbol, the symbol needs to be the second argument (the source position), and the host pointer is the first argument. That's not what you have here:

cudaError_t err2 = cudaMemcpyFromSymbol(const_d_a, b, size);
                                          ^        ^
                                          |       This should be the symbol.
                                          |  
                                   This is supposed to be the host destination pointer.

You can discover this with a review of the API documentation.

If we reverse the order of those two arguments in that line of code:

cudaError_t err2 = cudaMemcpyFromSymbol(b, const_d_a, size);

Your code will run with no errors and the final results printed will match.

There is no need to use an ampersand with either of the a or b pointers in these functions. a and b are already pointers. In the example you linked, pi_gpu_h is not a pointer. It is an ordinary variable. To copy something to it using cudaMemcpyFromSymbol, it is necessary to take the address of that ordinary variable, because the function expects a (destination) pointer.

As an aside, this doesn't look right:

__constant__ float const_d_a[N * sizeof(float)];

This is effectively a static array declaration, and apart from the __constant__ decorator it should be done equivalently to how you would do it in C or C++. It's not necessary to multiply N by sizeof(float) here, if you want storage for N float quantities. Just N by itself will do that:

__constant__ float const_d_a[N];

however leaving that as-is does not create problems for the code you have posted.

CUDA cudaMemcpyFromSymbol "invalid device symbol" error?

Answers (1)

Related Questions

CUDA cudaMemcpyFromSymbol &quot;invalid device symbol&quot; error?

Answers (1)

Related Questions

CUDA cudaMemcpyFromSymbol "invalid device symbol" error?