erogol
erogol

Reputation: 13614

How to define a global memory array at device code and pass the value to host after execution?

I try to create a device global memory array in the kernel code and after the exection is finished, pass the array content to host memory. Is it possible to create a global memory array at device code scope dynamically, or do I need to define the array out side if the device code score as global array.

__global__ void kernel_code(...,int array_size){
    __device__ int array_data[size];

    // fill the array_data 
    ...
}

int main(){
    //pass data from array_data to host array
}

is it possible to do that, if it is not what is the most likely practice?

Upvotes: 0

Views: 1405

Answers (2)

axon
axon

Reputation: 1200

Normal practice is to manipulate device memory only in kernels (it's much faster). Simply use cudaMemcpy(dst, src, cudaMemcpyDeviceToHost) to copy the data into host memory (in main()).

Upvotes: 0

Robert Crovella
Robert Crovella

Reputation: 151809

The allocation of the array must be able to be performed statically by the compiler. So you cannot declare the size of it to be a parameter that you pass to a kernel.

Furthermore, a __device__ variable declaration is not allowed inside a function body. So it has to be at global scope in your module, not at function scope.

Apart from that, you can pass data between a statically declared device array and a host array. The __device__ variable has the following characteristics:

  • Resides in global memory space,
  • Has the lifetime of an application,
  • Is accessible from all the threads within the grid and from the host through the runtime library (cudaGetSymbolAddress() / cudaGetSymbolSize() / cudaMemcpyToSymbol() / cudaMemcpyFromSymbol()).

So in your host code, you would use cudaMemcpyToSymbol to transfer data to from your host array to the device array, and cudaMemcpyFromSymbol to transfer data from the device array to the host array.

For dynamically sized device arrays, the most common practice would be to allocate them using ordinary host runtime API functions like cudaMalloc and transfer data from a host array to a device array or vice-versa using cudaMemcpy

Upvotes: 1

Related Questions