Reputation: 51
Can NPP functions, more concrete npps (https://docs.nvidia.com/cuda/npp/group__npps.html) be called as a device function?
If I create a global function can I inside call npps functions as nppsMaxIndx_32f
(to compute max of a vector)?
Example: I have 100 vectors of 10000 floats each, if I do it in host code I have to make 100 calls to npp function
If I make a global function of 100 threads and inside call the npp function for each vector so they launch simultaneously, will this work? nppsMaxIndx_32f
can be called as a device function?
Upvotes: 3
Views: 393
Reputation: 51
This is not possible -- NPP functions are host only functions. Trying will produce errors:
functions.cu(237): error: calling a __host__ function("nppsMaxIndx_32f") from a
__global__ function("computeMax") is notallowed
functions.cu(237): error: identifier "nppsMaxIndx_32f" is undefined in device code
However, making the call in host code without a synchronization of the GPU will call them almost simultaneously without waiting for the previous one to finish, but this can only be done safely if there is no requirement for ordering of the calls and the data for overlapping calls is fully independent.
Upvotes: 1