NPP: Overlapping computation and data transfer

Question

CUDA allows to overlap computation and data transfer using cuMemcpy async functions and streams. But is it possible with NPP(Performance Primitives)?

A little background. I am trying to utilize GPU using NPP image resize functions (in our case it is nppiResize_8u_C3R). I am using pinned memory and successfully transfer data to GPU using cuMemcpy2DAsync_v2 and per thread stream. The problem is that nppiResize_8u_C3R and all other computation functions do not accept streams.

When I run Nvidia Visual Profiler I see the next:

Pinned memory allows me to transfer data faster - ~6.524 GB/s.
The percentage of time when memcpy is being performed in parallel with compute is 0%.

NPP: Overlapping computation and data transfer

Answers (1)

Related Questions