Reputation: 13614
I try to create a device global memory array in the kernel code and after the exection is finished, pass the array content to host memory. Is it possible to create a global memory array at device code scope dynamically, or do I need to define the array out side if the device code score as global array.
__global__ void kernel_code(...,int array_size){
__device__ int array_data[size];
// fill the array_data
...
}
int main(){
//pass data from array_data to host array
}
is it possible to do that, if it is not what is the most likely practice?
Upvotes: 0
Views: 1405
Reputation: 1200
Normal practice is to manipulate device memory only in kernels (it's much faster). Simply use cudaMemcpy(dst, src, cudaMemcpyDeviceToHost)
to copy the data into host memory (in main()
).
Upvotes: 0
Reputation: 151809
The allocation of the array must be able to be performed statically by the compiler. So you cannot declare the size of it to be a parameter that you pass to a kernel.
Furthermore, a __device__
variable declaration is not allowed inside a function body. So it has to be at global scope in your module, not at function scope.
Apart from that, you can pass data between a statically declared device array and a host array. The __device__
variable has the following characteristics:
So in your host code, you would use cudaMemcpyToSymbol to transfer data to from your host array to the device array, and cudaMemcpyFromSymbol to transfer data from the device array to the host array.
For dynamically sized device arrays, the most common practice would be to allocate them using ordinary host runtime API functions like cudaMalloc
and transfer data from a host array to a device array or vice-versa using cudaMemcpy
Upvotes: 1