user1123502
user1123502

Reputation: 368

GPU-accelerated sort (~1GB) and merge sort (~100GB)

I'm asking for a c++ library to do GPU-accelerated sort (around 1GB of data) and merge sort (say, around 100GB of data — but the size do not matter, because merge is a stream algorithm). License have to be LGPL, BSD or like this. I greatly prefer OpenCL because of portability (but I also interested in links to CUDA libraries). I appreciate links to papers and blog posts on this subject.

Some background (please correct me if I wrong):

2-way merge sort of 1GB (that is, 128 000 000 of 8-bytes entities) will consume approximately log2(128 000 000)·1G = 27GB of memory bandwidth, that is around 1 second on modern CPU with sequential memory bandwidth of ~30GB/s. (Any non-merge sort seems to take much longer time, because non-sequential memory access is slower in 10-100 times).

Although I am not familiar with modern GPU, I suspect that merge sort of 1GB will take 0.2 second or even less, because typical GPU memory bandwidth is around 150GB/s, like AMD/ATI 58xx (see, for example http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units#Evergreen_.28HD_5xxx.29_series)

That is at least 5x speedup. (The time to transfer 1GB over 16x PCI-E 2.0 is around 0.125s, but it seems possible to make PCI transfers in parallel with sorting; however, this may require 2GB or 3GB of video-card memory instead of 1GB).

I suspect even more speed-up due to more-than-2-way merge sort or some sort, suitable for GPU.

Upvotes: 3

Views: 2690

Answers (1)

Dirk is no longer here
Dirk is no longer here

Reputation: 368399

Have you looked at Thrust ?

From the project page:

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances developer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB and OpenMP) facilitates integration with existing software. Develop high-performance applications rapidly with Thrust!

License is Apache so it should suit you.

Upvotes: 3

Related Questions