Bitonic sort: how can I find the top N elements from an array of length K, without doing a full sort - stopping when the top N are found?

I am writing HLS (high-level synthesis) code and need to find the top N elements in an array of length K.
I could do a full bitonic sort (8 elements: bitonic sorting network )
and then truncate the array at 32 elements.

However this is less efficient than doing a partial sort, which stops when the top N elements are found. This would avoid unnecessarily sorting the other K-N elements.
The top N also don't need to be sorted, so this can also be skipped.

How would I change the typical bitonic sort to meet these requirements?

Upvotes: 3

Answers (2)

greybeard

Reputation: 2497

TL;DR: If you want selection (select top 32 of ???), use a selection mechanism, not an ordering("sorting") mechanism.

If something like below top-5-of-32 (derived from Smallest and fastest 32-sorting network) is what you're after,

(103 CEs (Compare-Exchange elements, 2-sorters - whatever):
of a total of 185, 82 can be omitted, and 27 need only one output),
you don't need outputs/signals/circuitry you don't use.
Looking from the outputs, you do need outputs in the wanted range where the input is in the unwanted one, making that input "channel" (horizontal line) needed.
And every output feeding a needed channel.

Of the regular networks with O(nlog²n) CEs, bitonic uses more CEs than odd–even mergesort or pairwise even for full sorts.
It distributes the used property fast as can be, so gains from not using some of the outputs are minimal.
On the other hand, it needs less signal routing (see how "crammed" each of the below diagrams of select 8-of-64 looks), and for an even number of values to select, it uses less layers. Combined with less routing, this results in being faster.

Check out Parberry's pairwise sorting networks using the same number of CEs and layers for a full sorter as Batcher's odd–even mergesort:
Pairwise Networks are Superior for Selection Moshe Zazon-Ivry, Michael Codish

One selection algorithm I think I understood is presented in
Kelong Cong, Robin Geelen, Jiayi Kang, and Jeongeun Park:
"Revisiting Oblivious Top-𝑘 Selection with Applications to Secure 𝑘-NN Classification":
the main idea is to use a specialised merge in a merge-based sorting network, e.g. even-odd merge.
(See the paper for intricate further improvements for selecting fewer than about twice the square root of the number of input values.)

Not using "the min-output" of a 2-sorter allows using "a max", red lower dot. Sorters that may be omitted are all red above, or orange below where needed by a partial sort.
Cost figures in a table below the diagrams.

bitonic:

odd-even merge:

pairwise:

Bert Dobbelaere 4×16+merge:

Selection network from 8 8-sorters followed by 4+2 order-preserving top8-mergers and one more non-preserving merger:

The last j layers of a bitonic sorter have CEs from channel i×2^j to the next above and none to the previous one: when selecting 2ⁿ values, the network doesn't need the last log₂n layers. Generall the number of dispensable layers is the number of trailing zeroes in the binary representation of the number of values to select.
The last layer of pairwise & odd-even merge sorters has CEs from odd channels to the even one above: when selecting an odd number of values, the network doesn't need the last layer.

A bitonic "selector" for few elements has about the same number of CEs as a full pairwise or odd-even merge sorter. "Hand crafted" small sorters have a few CEs less than pairwise, but many more CEs can be omitted.
The numbers for pairwise are closer to hand crafted ones than to odd-even merge.

Table for larger numbers of inputs:
spanned is the number of inter-channel gaps spanned by used CEs, some measure of additional "vertical signal routing". Relative impact of #CEs and spanned will very much depend on implementation technique.

construction	#CEs	select	inputs	total	layers	omitted	maxes	spanned
dobbelaere32	103	5	32	185	13	82	27	538
pairwise32	91	2	32	191	15	100	27	532
bitonic2	190	2	32	240	14	50	30	756
pairwise32	90	3	32	191	14	101	26	531
bitonic2	191	3	32	240	15	49	29	757
pairwise32	118	4	32	191	15	73	26	749
bitonic2	188	4	32	240	13	52	28	752
pairwise32	115	5	32	191	14	76	25	743
bitonic2	191	5	32	240	15	49	27	757
pairwise32	119	6	32	191	15	72	24	769
bitonic2	190	6	32	240	14	50	26	756
pairwise64	302	6	64	543	21	241	51	3476
bitonic2	542	6	64	672	20	130	58	3188
pairwise64	300	7	64	543	20	243	50	3472
bitonic2	543	7	64	672	21	129	57	3189
sort_merge2	280	8	64	280	15	0	56	1376
dobbelaere64	310	8	64	525	20	215	53	3292
pairwise64	340	8	64	543	21	203	52	4011
odd-even merge064	389	8	64	543	21	154	52	2099
odd-even merge64	389	8	64	543	21	154	52	4815
bitonic2	536	8	64	672	18	136	56	3168
pairwise64	334	9	64	543	20	209	51	3988
bitonic2	543	9	64	672	21	129	55	3189
pairwise64	339	10	64	543	21	204	50	4045
bitonic2	542	10	64	672	20	130	54	3188
pairwise128	853	8	128	1471	28	618	105	18473
bitonic2	1464	8	128	1792	25	328	120	13120
pairwise256	2283	16	256	3839	36	1556	209	92885
bitonic2	3824	16	256	4608	32	784	240	53376
pairwise512	5656	16	512	9727	45	4071	412	430719
bitonic2	9712	16	512	11520	41	1808	496	215808
pairwise1024	14129	30	1024	24063	55	9934	813	2009736
bitonic2	24062	30	1024	28160	54	4098	994	868180
pairwise1024	14125	31	1024	24063	54	9938	812	2009710
bitonic2	24063	31	1024	28160	55	4097	993	868181
sort_merge2	9504 9312	32	1024	9504	40 39	0	992	168192 160896
pairwise1024	14414	32	1024	24063	55	9649	820	2055383
odd-even merge01024	17461	32	1024	24063	55	6602	816	464199
odd-even merge1024	17461	32	1024	24063	55	6602	816	2436543
bitonic2	24032	32	1024	28160	50	4128	992	867840
pairwise1024	14399	33	1024	24063	54	9664	819	2055182
bitonic2	24063	33	1024	28160	55	4097	991	868181
pairwise1024	14408	34	1024	24063	55	9655	818	2056195
bitonic2	24062	34	1024	28160	54	4098	990	868180

Upvotes: 4

Xellos

Reputation: 349

The Musser Introselect algorithm does exactly what you need. Its worst-case complexity is linear in K and doesn't depend on N. I believe it's implemented in Python as numpy.partition.

There are also other Introselect algorithms, like the one implemented in C++ as std::nth_element which has O(K log K) worst-case complexity and O(K) average/realistic complexity.

(Note that the usual definition of N in introselect algorithms is flipped - it corresponds to your K-N. The resulting array has N smallest elements on the left and K-N largest on the right. More precisely, all the leftmost N elements <= all the rightmost K-N elements. The elements within each part can be in any order.)

Unfortunately there's no such thing as stopping when the largest N elements are found, since it's necessary to look at all K elements to know it. The introselect algorithms don't "pick elements", they perform partial sorting, but that's optimal in terms of algorithmic complexity.

Upvotes: -1

Bitonic sort: how can I find the top N elements from an array of length K, without doing a full sort - stopping when the top N are found?

Answers (2)

Related Questions