Reputation: 103
I am writing HLS (high-level synthesis) code and need to find the top N elements in an array of length K.
I could do a full bitonic sort (8 elements:
bitonic sorting network )
and then truncate the array at 32 elements.
However this is less efficient than doing a partial sort, which stops when the top N elements are found. This would avoid unnecessarily sorting the other K-N elements.
The top N also don't need to be sorted, so this can also be skipped.
How would I change the typical bitonic sort to meet these requirements?
Upvotes: 3
Views: 122
Reputation: 2497
TL;DR: If you want selection (select top 32 of ???), use a selection mechanism, not an ordering("sorting") mechanism.
If something like below top-5-of-32 (derived from Smallest and fastest 32-sorting network) is what you're after,
(103 CEs (Compare-Exchange elements, 2-sorters - whatever):
of a total of 185, 82 can be omitted, and 27 need only one output),
you don't need outputs/signals/circuitry you don't use.
Looking from the outputs, you do need outputs in the wanted range where the input is in the unwanted one, making that input "channel" (horizontal line) needed.
And every output feeding a needed channel.
Of the regular networks with O(nlog2n) CEs, bitonic uses more CEs than odd–even mergesort or pairwise even for full sorts.
It distributes the used property fast as can be, so gains from not using some of the outputs are minimal.
On the other hand, it needs less signal routing (see how "crammed" each of the below diagrams of select 8-of-64 looks), and for an even number of values to select, it uses less layers. Combined with less routing, this results in being faster.
Check out Parberry's pairwise sorting networks using the same number of CEs and layers for a full sorter as Batcher's odd–even mergesort:
Pairwise Networks are Superior for Selection
Moshe Zazon-Ivry, Michael Codish
One selection algorithm I think I understood is presented in
Kelong Cong, Robin Geelen, Jiayi Kang, and Jeongeun Park:
"Revisiting Oblivious Top-𝑘 Selection with
Applications to Secure 𝑘-NN Classification":
the main idea is to use a specialised merge in a merge-based sorting network, e.g. even-odd merge.
(See the paper for intricate further improvements for selecting fewer than about twice the square root of the number of input values.)
Not using "the min-output" of a 2-sorter allows using "a max", red lower dot. Sorters that may be omitted are all red above, or orange below where needed by a partial sort.
Cost figures in a table below the diagrams.
bitonic:
odd-even merge:
pairwise:
Bert Dobbelaere 4×16+merge:
Selection network from 8 8-sorters followed by 4+2 order-preserving top8-mergers and one more non-preserving merger:
The last j layers of a bitonic sorter have CEs from channel i×2j to the next above and none to the previous one: when selecting 2n values, the network doesn't need the last log2n layers. Generall the number of dispensable layers is the number of trailing zeroes in the binary representation of the number of values to select.
The last layer of pairwise & odd-even merge sorters has CEs from odd channels to the even one above: when selecting an odd number of values, the network doesn't need the last layer.
A bitonic "selector" for few elements has about the same number of CEs as a full pairwise or odd-even merge sorter.
"Hand crafted" small sorters have a few CEs less than pairwise, but many more CEs can be omitted.
The numbers for pairwise are closer to hand crafted ones than to odd-even merge.
Table for larger numbers of inputs:
spanned is the number of inter-channel gaps spanned by used CEs, some measure of additional "vertical signal routing". Relative impact of #CEs and spanned will very much depend on implementation technique.
construction | #CEs | select | inputs | total | layers | omitted | maxes | spanned |
---|---|---|---|---|---|---|---|---|
dobbelaere32 | 103 | 5 | 32 | 185 | 13 | 82 | 27 | 538 |
pairwise32 | 91 | 2 | 32 | 191 | 15 | 100 | 27 | 532 |
bitonic2 | 190 | 2 | 32 | 240 | 14 | 50 | 30 | 756 |
pairwise32 | 90 | 3 | 32 | 191 | 14 | 101 | 26 | 531 |
bitonic2 | 191 | 3 | 32 | 240 | 15 | 49 | 29 | 757 |
pairwise32 | 118 | 4 | 32 | 191 | 15 | 73 | 26 | 749 |
bitonic2 | 188 | 4 | 32 | 240 | 13 | 52 | 28 | 752 |
pairwise32 | 115 | 5 | 32 | 191 | 14 | 76 | 25 | 743 |
bitonic2 | 191 | 5 | 32 | 240 | 15 | 49 | 27 | 757 |
pairwise32 | 119 | 6 | 32 | 191 | 15 | 72 | 24 | 769 |
bitonic2 | 190 | 6 | 32 | 240 | 14 | 50 | 26 | 756 |
pairwise64 | 302 | 6 | 64 | 543 | 21 | 241 | 51 | 3476 |
bitonic2 | 542 | 6 | 64 | 672 | 20 | 130 | 58 | 3188 |
pairwise64 | 300 | 7 | 64 | 543 | 20 | 243 | 50 | 3472 |
bitonic2 | 543 | 7 | 64 | 672 | 21 | 129 | 57 | 3189 |
sort_merge2 | 280 | 8 | 64 | 280 | 15 | 0 | 56 | 1376 |
dobbelaere64 | 310 | 8 | 64 | 525 | 20 | 215 | 53 | 3292 |
pairwise64 | 340 | 8 | 64 | 543 | 21 | 203 | 52 | 4011 |
odd-even merge064 | 389 | 8 | 64 | 543 | 21 | 154 | 52 | 2099 |
odd-even merge64 | 389 | 8 | 64 | 543 | 21 | 154 | 52 | 4815 |
bitonic2 | 536 | 8 | 64 | 672 | 18 | 136 | 56 | 3168 |
pairwise64 | 334 | 9 | 64 | 543 | 20 | 209 | 51 | 3988 |
bitonic2 | 543 | 9 | 64 | 672 | 21 | 129 | 55 | 3189 |
pairwise64 | 339 | 10 | 64 | 543 | 21 | 204 | 50 | 4045 |
bitonic2 | 542 | 10 | 64 | 672 | 20 | 130 | 54 | 3188 |
pairwise128 | 853 | 8 | 128 | 1471 | 28 | 618 | 105 | 18473 |
bitonic2 | 1464 | 8 | 128 | 1792 | 25 | 328 | 120 | 13120 |
pairwise256 | 2283 | 16 | 256 | 3839 | 36 | 1556 | 209 | 92885 |
bitonic2 | 3824 | 16 | 256 | 4608 | 32 | 784 | 240 | 53376 |
pairwise512 | 5656 | 16 | 512 | 9727 | 45 | 4071 | 412 | 430719 |
bitonic2 | 9712 | 16 | 512 | 11520 | 41 | 1808 | 496 | 215808 |
pairwise1024 | 14129 | 30 | 1024 | 24063 | 55 | 9934 | 813 | 2009736 |
bitonic2 | 24062 | 30 | 1024 | 28160 | 54 | 4098 | 994 | 868180 |
pairwise1024 | 14125 | 31 | 1024 | 24063 | 54 | 9938 | 812 | 2009710 |
bitonic2 | 24063 | 31 | 1024 | 28160 | 55 | 4097 | 993 | 868181 |
sort_merge2 | 9504 9312 |
32 | 1024 | 9504 | 40 39 |
0 | 992 | 168192 160896 |
pairwise1024 | 14414 | 32 | 1024 | 24063 | 55 | 9649 | 820 | 2055383 |
odd-even merge01024 | 17461 | 32 | 1024 | 24063 | 55 | 6602 | 816 | 464199 |
odd-even merge1024 | 17461 | 32 | 1024 | 24063 | 55 | 6602 | 816 | 2436543 |
bitonic2 | 24032 | 32 | 1024 | 28160 | 50 | 4128 | 992 | 867840 |
pairwise1024 | 14399 | 33 | 1024 | 24063 | 54 | 9664 | 819 | 2055182 |
bitonic2 | 24063 | 33 | 1024 | 28160 | 55 | 4097 | 991 | 868181 |
pairwise1024 | 14408 | 34 | 1024 | 24063 | 55 | 9655 | 818 | 2056195 |
bitonic2 | 24062 | 34 | 1024 | 28160 | 54 | 4098 | 990 | 868180 |
Upvotes: 4
Reputation: 349
The Musser Introselect algorithm does exactly what you need. Its worst-case complexity is linear in K and doesn't depend on N. I believe it's implemented in Python as numpy.partition
.
There are also other Introselect algorithms, like the one implemented in C++ as std::nth_element
which has O(K log K) worst-case complexity and O(K) average/realistic complexity.
(Note that the usual definition of N in introselect algorithms is flipped - it corresponds to your K-N. The resulting array has N smallest elements on the left and K-N largest on the right. More precisely, all the leftmost N elements <= all the rightmost K-N elements. The elements within each part can be in any order.)
Unfortunately there's no such thing as stopping when the largest N elements are found, since it's necessary to look at all K elements to know it. The introselect algorithms don't "pick elements", they perform partial sorting, but that's optimal in terms of algorithmic complexity.
Upvotes: -1