Reputation: 391
I need to compute the median of an array of size p inside a CUDA kernel (in my case, p is small e.g. p = 10). I am using an O(p^2) algorithm for its simplicity, but at the cost of time performance.
Is there a "function" to find the median efficiently that I can call inside a CUDA kernel?
I know I could implement a selection algorithm, but I'm looking for a function and/or tested code.
Thanks!
Upvotes: 3
Views: 5451
Reputation: 24618
Here are a few hints:
There are many other optimizations you can do. Make sure, you read through the CUDA documents, especially the Programming Guide and the Best Practices Guide. When you really want to gun for high performance, don't forget to take a good look at a CUDA profiler, such as the Visual Profiler.
Upvotes: 3
Reputation: 5939
Even in a single thread one can sort the array and pick the value in the middle in O(p*log(p)), which makes O(p^2) look excessive. If you have p threads at your disposal it's also possible to sort the array as fast as O(log(p)), although that may not be the fastest solution for small p. See the top answer here:
Which parallel sorting algorithm has the best average case performance?
Upvotes: 1