Adarsh
Adarsh

Reputation: 453

What is the most optimized way of getting a single element from a device array in cuda

I have an array on device of huge length and for some condition check I want to access (On Host/ CPU) only one element from middle (say Nth element). What could be the optimized way for doing this.

Do I need to write a kernel that writes Nth location in single element array from the src array and then I copy single element array to host?

Upvotes: 0

Views: 1066

Answers (2)

Nick Hockings
Nick Hockings

Reputation: 29

One addendum to answer 1, you may need to take account of the bytes per element of your array. e.g. For an array of arrays of various types on the device:

#ifdef CUDA_KERNEL
    char*   mgpu[ MAX_BUF ];    // Device array of pointers to arrays of various types.
#else           
    CUdeviceptr     mgpu[ MAX_BUF ];    // on host, gpu is a device pointer.
    CUdeviceptr     gpu (int n )    { return mgpu[n]; }

CUdeviceptr GPUpointer = m_Fluid.gpu(FGRIDOFF); // Device pointer to FGRIDOFF (int) array
cuMemcpyDtoH (&CPUelement, GPUpointer+(offset*sizeof(int)) , sizeof(int) );

Upvotes: 0

pSoLT
pSoLT

Reputation: 1052

You can copy single element of an array using cudaMemcpy. Let's say you want to copy N-th element of array:

int * dSourceArray

to variable

int hTargetVariable

You can apply device pointer arithmetics on the host. All you need to do is to move dSourceArray pointer by N elements ant copy single element:

cudaMemcpy(&hTargetVariable, dSourceArray+N, sizeof(int), cudaMemcpyDeviceToHost)

Keep in mind that if you use multiple streams you would like to synchronize the device before transferring the data.

Upvotes: 2

Related Questions