taoyuanjl
taoyuanjl

Reputation: 163

Recommend a fast sorting algorithm for local order among the segment in the array

Sorting data in each segment in an array on GPU, the size of segment is 32, and there are no sorting or merging further for different segments. So I load the data of the each segment into the shared memory from global memory, and store the data into the global memory after I finished sorting of each segment. What's the parallel algorithm is prefer for higher throughput?

Upvotes: 0

Views: 427

Answers (2)

einpoklum
einpoklum

Reputation: 131960

I suggest you use the in-warp bitonic sort, which is implemented very efficiently using the SHFL instruction of the Kepler architecture. See code in this GTC 2013 presentation:

Kepler's SHUFFLE(SHFL) Instruction: Tips and Tricks

Using it will also mean you won't need to bother with shared memory, just load one value by each thread into a register.

Upvotes: 2

Farzad
Farzad

Reputation: 3438

Since segment sizes are all 32, I personally suggest merge sort. There's also this paper you can refer to.

Upvotes: 1

Related Questions