Reputation: 2552
I have some working code... where I allocate a device variable pointer as follows:
float *d_var;
cudaMalloc(&d_var, sizeof(float) );
Later on in my code, I want to copy the contents of this var to a local var (ref):
checkCudaErrors(cudaMemcpy(&h_var, &d_var, sizeof(float), cudaMemcpyDeviceToHost));
Which works great! But using cudaMalloc
is slow!
So I want to instead allocate the variable without using cudaMalloc
using a __device__
definition:
__device__ float d_var = 1000000000.0f;
This works great and I know the d_var in this case is initialized properly and I can do all my work with it like before. I've been printf'ing its contents, so I know it has the right contents. But when I try to copy the contents now to my host var using the same code as before:
checkCudaErrors(cudaMemcpy(&h_var, &d_var, sizeof(float), cudaMemcpyDeviceToHost));
I get a really vague error:
invalid argument cudaMemcpy(&h_var, &d_var, sizeof(float), cudaMemcpyDeviceToHost)
I've tried referring to the variable as &d_var
, d_var
, *d_var
to no avail.
Any help MUCH appreciated.
Thanks!
Upvotes: 2
Views: 641
Reputation: 2552
Bah, I figured it out....
Instead of calling cudaMemcpy();
I need to use cudaMemcpyFromSymbol();
checkCudaErrors(cudaMemcpyFromSymbol(&h_var, d_var, sizeof(float), 0, cudaMemcpyDeviceToHost));
Upvotes: 4