Reputation: 5103
What is the fastest way to find the k largest elements in an array in order (i.e. starting from the largest element to the kth largest element)?
Upvotes: 7
Views: 8270
Reputation: 109557
There was a question on performance & restricted resources.
Make a value class for the top 3 values. Use such an accumulator for reduction in a parallel stream. Limit the parallelism according to the context (memory, power).
class BronzeSilverGold {
int[] values = new int[] {Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE};
// For reduction
void add(int x) {
...
}
// For combining two results of two threads.
void merge(BronzeSilverGold other) {
...
}
}
The parallelism must be restricted in your constellation, hence specify an N_THREADS in:
try {
ForkJoinPool threadPool = new ForkJoinPool(N_THREADS);
threadPool.submit(() -> {
BronzeSilverGold result = IntStream.of(...).parallel().collect(
BronzeSilverGold::new,
(bsg, n) -> BronzeSilverGold::add,
(bsg1, bsg2) -> bsg1.merge(bsg2));
...
});
} catch (InterruptedException | ExecutionException e) {
prrtl();
}
Upvotes: 0
Reputation: 678
Here's a solution with O(N + k lg k)
complexity.
int[] kLargest_Dremio(int[] A, int k) {
int[] result = new int[k];
shouldGetIndex = true;
int q = AreIndicesValid(0, A.Length - 1) ? RandomizedSelet(0, A.Length-1,
A.Length-k+1) : -1;
Array.Copy(A, q, result, 0, k);
Array.Sort(result, (a, b) => { return a>b; });
return result;
}
AreIndicesValid
and RandomizedSelet
are defined in this github source file.
Upvotes: 0
Reputation: 31
1) Build a Max Heap tree in O(n)
2) Use Extract Max k times to get k maximum elements from the Max Heap O(klogn)
Time complexity: O(n + klogn)
A C++ implementation using STL is given below:
#include <iostream>
#include<bits/stdc++.h>
using namespace std;
int main() {
int arr[] = {4,3,7,12,23,1,8,5,9,2};
//Lets extract 3 maximum elements
int k = 3;
//First convert the array to a vector to use STL
vector<int> vec;
for(int i=0;i<10;i++){
vec.push_back(arr[i]);
}
//Build heap in O(n)
make_heap(vec.begin(), vec.end());
//Extract max k times
for(int i=0;i<k;i++){
cout<<vec.front()<<" ";
pop_heap(vec.begin(),vec.end());
vec.pop_back();
}
return 0;
}
Upvotes: 2
Reputation: 21
Radix sort solution:
Time complexity: O(N*L), where L = length of the largest element, can assume L = O(1). Space used: O(N) for radix sort.
However, I think radix sort has costly overhead, making its linear time complexity less attractive.
Upvotes: 2
Reputation: 637
C++ also provides the partial_sort algorithm, which solves the problem of selecting the smallest k elements (sorted), with a time complexity of O(n log k). No algorithm is provided for selecting the greatest k elements since this should be done by inverting the ordering predicate.
For Perl, the module Sort::Key::Top, available from CPAN, provides a set of functions to select the top n elements from a list using several orderings and custom key extraction procedures. Furthermore, the Statistics::CaseResampling module provides a function to calculate quantiles using quickselect.
Python's standard library (since 2.4) includes heapq.nsmallest() and nlargest(), returning sorted lists, the former in O(n + k log n) time, the latter in O(n log k) time.
Upvotes: 1
Reputation: 1569
@templatetypedef's solution is probably the fastest one, assuming you can modify or copy input.
Alternatively, you can use heap or BST (set
in C++) to store k largest elements at given moment, then read array's elements one by one. While this is O(n lg k), it doesn't modify input and only uses O(k) additional memory. It also works on streams (when you don't know all the data from the beginning).
Upvotes: 0
Reputation: 372814
One option would be the following:
Using a linear-time selection algorithm like median-of-medians or introsort, find the kth largest element and rearrange the elements so that all elements from the kth element forward are greater than the kth element.
Sort all elements from the kth forward using a fast sorting algorithm like heapsort or quicksort.
Step (1) takes time O(n), and step (2) takes time O(k log k). Overall, the algorithm runs in time O(n + k log k), which is very, very fast.
Hope this helps!
Upvotes: 13