physicist1911
physicist1911

Reputation: 103

Bitonic sort: how can I find the top N elements from an array of length K, without doing a full sort - stopping when the top N are found?

I am writing HLS (high-level synthesis) code and need to find the top N elements in an array of length K.
I could do a full bitonic sort (8 elements: bitonic sorting network )
and then truncate the array at 32 elements.

However this is less efficient than doing a partial sort, which stops when the top N elements are found. This would avoid unnecessarily sorting the other K-N elements.
The top N also don't need to be sorted, so this can also be skipped.

How would I change the typical bitonic sort to meet these requirements?

Upvotes: 3

Views: 122

Answers (2)

greybeard
greybeard

Reputation: 2497

TL;DR: If you want selection (select top 32 of ???), use a selection mechanism, not an ordering("sorting") mechanism.

If something like below top-5-of-32 (derived from Smallest and fastest 32-sorting network) is what you're after,
merge-select top 5 of 32
(103 CEs (Compare-Exchange elements, 2-sorters - whatever):
of a total of 185, 82 can be omitted, and 27 need only one output),
you don't need outputs/signals/circuitry you don't use.
Looking from the outputs, you do need outputs in the wanted range where the input is in the unwanted one, making that input "channel" (horizontal line) needed.
And every output feeding a needed channel.

Of the regular networks with O(nlog2n) CEs, bitonic uses more CEs than odd–even mergesort or pairwise even for full sorts.
It distributes the used property fast as can be, so gains from not using some of the outputs are minimal.
On the other hand, it needs less signal routing (see how "crammed" each of the below diagrams of select 8-of-64 looks), and for an even number of values to select, it uses less layers. Combined with less routing, this results in being faster.

Check out Parberry's pairwise sorting networks using the same number of CEs and layers for a full sorter as Batcher's odd–even mergesort:
Pairwise Networks are Superior for Selection Moshe Zazon-Ivry, Michael Codish

One selection algorithm I think I understood is presented in
Kelong Cong, Robin Geelen, Jiayi Kang, and Jeongeun Park:
"Revisiting Oblivious Top-𝑘 Selection with Applications to Secure 𝑘-NN Classification":
the main idea is to use a specialised merge in a merge-based sorting network, e.g. even-odd merge.
(See the paper for intricate further improvements for selecting fewer than about twice the square root of the number of input values.)

Not using "the min-output" of a 2-sorter allows using "a max", red lower dot. Sorters that may be omitted are all red above, or orange below where needed by a partial sort.
Cost figures in a table below the diagrams.

bitonic:
bitonic select 8 of 64
odd-even merge:
odd-even merge select 8 of 64
pairwise:
pairwise select 8 of 64
Bert Dobbelaere 4×16+merge:
select 8 of 64 from hand crafted 64-sorter


Selection network from 8 8-sorters followed by 4+2 order-preserving top8-mergers and one more non-preserving merger: sort8 + merge top8

The last j layers of a bitonic sorter have CEs from channel i×2j to the next above and none to the previous one: when selecting 2n values, the network doesn't need the last log2n layers. Generall the number of dispensable layers is the number of trailing zeroes in the binary representation of the number of values to select.
The last layer of pairwise & odd-even merge sorters has CEs from odd channels to the even one above: when selecting an odd number of values, the network doesn't need the last layer.

A bitonic "selector" for few elements has about the same number of CEs as a full pairwise or odd-even merge sorter. "Hand crafted" small sorters have a few CEs less than pairwise, but many more CEs can be omitted.
The numbers for pairwise are closer to hand crafted ones than to odd-even merge.

Table for larger numbers of inputs:
spanned is the number of inter-channel gaps spanned by used CEs, some measure of additional "vertical signal routing". Relative impact of #CEs and spanned will very much depend on implementation technique.

construction #CEs select inputs total layers omitted maxes spanned
dobbelaere32 103 5 32 185 13 82 27 538
pairwise32 91 2 32 191 15 100 27 532
bitonic2 190 2 32 240 14 50 30 756
pairwise32 90 3 32 191 14 101 26 531
bitonic2 191 3 32 240 15 49 29 757
pairwise32 118 4 32 191 15 73 26 749
bitonic2 188 4 32 240 13 52 28 752
pairwise32 115 5 32 191 14 76 25 743
bitonic2 191 5 32 240 15 49 27 757
pairwise32 119 6 32 191 15 72 24 769
bitonic2 190 6 32 240 14 50 26 756
pairwise64 302 6 64 543 21 241 51 3476
bitonic2 542 6 64 672 20 130 58 3188
pairwise64 300 7 64 543 20 243 50 3472
bitonic2 543 7 64 672 21 129 57 3189
sort_merge2 280 8 64 280 15 0 56 1376
dobbelaere64 310 8 64 525 20 215 53 3292
pairwise64 340 8 64 543 21 203 52 4011
odd-even merge064 389 8 64 543 21 154 52 2099
odd-even merge64 389 8 64 543 21 154 52 4815
bitonic2 536 8 64 672 18 136 56 3168
pairwise64 334 9 64 543 20 209 51 3988
bitonic2 543 9 64 672 21 129 55 3189
pairwise64 339 10 64 543 21 204 50 4045
bitonic2 542 10 64 672 20 130 54 3188
pairwise128 853 8 128 1471 28 618 105 18473
bitonic2 1464 8 128 1792 25 328 120 13120
pairwise256 2283 16 256 3839 36 1556 209 92885
bitonic2 3824 16 256 4608 32 784 240 53376
pairwise512 5656 16 512 9727 45 4071 412 430719
bitonic2 9712 16 512 11520 41 1808 496 215808
pairwise1024 14129 30 1024 24063 55 9934 813 2009736
bitonic2 24062 30 1024 28160 54 4098 994 868180
pairwise1024 14125 31 1024 24063 54 9938 812 2009710
bitonic2 24063 31 1024 28160 55 4097 993 868181
sort_merge2 9504
9312
32 1024 9504 40
39
0 992 168192
160896
pairwise1024 14414 32 1024 24063 55 9649 820 2055383
odd-even merge01024 17461 32 1024 24063 55 6602 816 464199
odd-even merge1024 17461 32 1024 24063 55 6602 816 2436543
bitonic2 24032 32 1024 28160 50 4128 992 867840
pairwise1024 14399 33 1024 24063 54 9664 819 2055182
bitonic2 24063 33 1024 28160 55 4097 991 868181
pairwise1024 14408 34 1024 24063 55 9655 818 2056195
bitonic2 24062 34 1024 28160 54 4098 990 868180

Upvotes: 4

Xellos
Xellos

Reputation: 349

The Musser Introselect algorithm does exactly what you need. Its worst-case complexity is linear in K and doesn't depend on N. I believe it's implemented in Python as numpy.partition.

There are also other Introselect algorithms, like the one implemented in C++ as std::nth_element which has O(K log K) worst-case complexity and O(K) average/realistic complexity.

(Note that the usual definition of N in introselect algorithms is flipped - it corresponds to your K-N. The resulting array has N smallest elements on the left and K-N largest on the right. More precisely, all the leftmost N elements <= all the rightmost K-N elements. The elements within each part can be in any order.)

Unfortunately there's no such thing as stopping when the largest N elements are found, since it's necessary to look at all K elements to know it. The introselect algorithms don't "pick elements", they perform partial sorting, but that's optimal in terms of algorithmic complexity.

Upvotes: -1

Related Questions