Reputation: 510
I'm trying to implement the Merge Sort algorithm on CUDA which is mentioned on the Designing Efficient Sorting Algorithms for Manycore GPUs.
For this as in the intermediate level, It has suggested to assign a thread block to merge two sorted arrays (say A and B) to a single array. And the idea is to assign a thread to data - a_i on the array A and find its location on B array using Binary Search. and if the position of a_i on B is j the a_i's position on new array is (i+j)
But when I implemented this (or tried to), I found a scenario that above method doesn't seems to work. For example if the two arrays that need to be merged are as follows,
and
So that the 53 on the first array (shaded in gray one) will find that its respective position on the second array is 11, So its position on the final array is (11 + 11 = 22). And likewise position of 53 on the first array which is shaded in blue is (11 + 12 = 23).
If we take the 53 of the second array its final position is also given as (11 + 11 = 22). Which conflicts with the grey color 53 on the first array.
So according to my understanding a simple binary search algorithm cannot be used to merge these two arrays. Is there a known or easier method to resolve this conflict?
Upvotes: 1
Views: 1878
Reputation: 1
Sorry I'm a bit late, but I just try to implement the same algorithm on a FPGA.
It should work if you just use <= comparison to binary sort A into B and < comparison if you sort B into A. That way you have a guaranteed order that elements from A will always be lower in the index than elements from B even if they are the same.
In your example the positions would be:
Sidenote: I would recommend start counting from 0 for the indexing (I choose the same counting as you for clarity). Like this the first index in the Array C is 1 + 1 = 2
Upvotes: 0
Reputation: 114
In the paper, I read:
Given the sorted sequences A and B, we want to compute the sorted sequence C = merge(A, B). If these sequences are sufficiently small, say of size no greater than t = 256 elements each, we can merge them using a single t-thread block. For an element ai ∈ A, we need only compute rank(ai , C), which is the position of element ai in the merged sequence C. Because both A and B are sorted, we know that rank(ai , C) = i + rank(ai , B), where rank(ai , B) is simply the count of elements bj ∈ B with bj < ai and which we compute efficiently using binary search. Elements of B can obviously be treated in the same way.
You have to deal with duplicates in the binary search, when you are looking for rank(ai, B). Imagine B as a BST, rank(ai, B) = max of j {bj <= ai} +1 should work well.
Upvotes: 1