Reputation: 5181
So I see a parent question about how to copy from host to the constant memory on GPU using cudaMemcpyToSymbol
.
My question is how to do the reverse, copying from device constant memory to the host using cudaMemcpyFromSymbol
.
In the following minimal reproducible example, I either got
invalid device symbol
error using cudaMemcpyFromSymbol(const_d_a, b, size);
, or segmentation fault
if I use cudaMemcpyFromSymbol(&b, const_d_a, size, cudaMemcpyDeviceToHost)
.I have consulted with the manual which suggests I code as in 1), and this SO question that suggests I code as in 2). Neither of them work here.
Could anyone kindly help suggesting a workaround with this? I must be understanding something improperly... Thanks!
Here is the code:
// a basic CUDA function to test working with device constant memory
#include <stdio.h>
#include <cuda.h>
const unsigned int N = 10; // size of vectors
__constant__ float const_d_a[N * sizeof(float)];
int main()
{
float * a, * b; // a and b are vectors. c is the result
a = (float *)calloc(N, sizeof(float));
b = (float *)calloc(N, sizeof(float));
/**************************** Exp 1: sequential ***************************/
int i;
int size = N * sizeof(float);
for (i = 0; i < N; i++){
a[i] = (float)i / 0.23 + 1;
}
// 1. copy a to constant memory
cudaError_t err = cudaMemcpyToSymbol(const_d_a, a, size);
if (err != cudaSuccess){
printf("%s in %s at line %d\n", cudaGetErrorString(err), __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
cudaError_t err2 = cudaMemcpyFromSymbol(const_d_a, b, size);
if (err2 != cudaSuccess){
printf("%s in %s at line %d\n", cudaGetErrorString(err2), __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
double checksum0, checksum1;
for (i = 0; i < N; i++){
checksum0 += a[i];
checksum1 += b[i];
}
printf("Checksum for elements in host memory is %f\n.", checksum0);
printf("Checksum for elements in constant memory is %f\n.", checksum1);
return 0;
}
Upvotes: 0
Views: 1673
Reputation: 152249
In CUDA, the various cudaMemcpy*
operations are modeled after the C standard library memcpy
routine. In that function, the first pointer is always the destination pointer and the second pointer is always the source pointer. That is true for all cudaMemcpy*
functions as well.
Therefore, if you want to do cudaMemcpyToSymbol
, the symbol had better be the first (destination) argument passed to the function (the second argument would be a host pointer). If you want to do cudaMemcpyFromSymbol
, the symbol needs to be the second argument (the source position), and the host pointer is the first argument. That's not what you have here:
cudaError_t err2 = cudaMemcpyFromSymbol(const_d_a, b, size);
^ ^
| This should be the symbol.
|
This is supposed to be the host destination pointer.
You can discover this with a review of the API documentation.
If we reverse the order of those two arguments in that line of code:
cudaError_t err2 = cudaMemcpyFromSymbol(b, const_d_a, size);
Your code will run with no errors and the final results printed will match.
There is no need to use an ampersand with either of the a
or b
pointers in these functions. a
and b
are already pointers. In the example you linked, pi_gpu_h
is not a pointer. It is an ordinary variable. To copy something to it using cudaMemcpyFromSymbol
, it is necessary to take the address of that ordinary variable, because the function expects a (destination) pointer.
As an aside, this doesn't look right:
__constant__ float const_d_a[N * sizeof(float)];
This is effectively a static array declaration, and apart from the __constant__
decorator it should be done equivalently to how you would do it in C or C++. It's not necessary to multiply N
by sizeof(float)
here, if you want storage for N
float
quantities. Just N
by itself will do that:
__constant__ float const_d_a[N];
however leaving that as-is does not create problems for the code you have posted.
Upvotes: 1