Reputation: 108
CUDA allows to overlap computation and data transfer using cuMemcpy async functions and streams. But is it possible with NPP(Performance Primitives)?
A little background. I am trying to utilize GPU using NPP image resize functions (in our case it is nppiResize_8u_C3R). I am using pinned memory and successfully transfer data to GPU using cuMemcpy2DAsync_v2 and per thread stream. The problem is that nppiResize_8u_C3R and all other computation functions do not accept streams.
When I run Nvidia Visual Profiler I see the next:
Upvotes: 1
Views: 913
Reputation: 72348
The problems [sic] is that nppiResize_8u_C3R and all other computation functions do not accept streams.
NPP is fundamentally a stateless API. However, to use streams with NPP, you use nppSetStream
to set the default stream for subsequent operations. There are several caveats noted on page 2 of the documentation about using NPP with streams and recommended synchronization practices when switching streams.
Upvotes: 2