user1742188
user1742188

Reputation: 5103

Find the k largest elements in order

What is the fastest way to find the k largest elements in an array in order (i.e. starting from the largest element to the kth largest element)?

Upvotes: 7

Views: 8270

Answers (7)

Joop Eggen
Joop Eggen

Reputation: 109557

There was a question on performance & restricted resources.

Make a value class for the top 3 values. Use such an accumulator for reduction in a parallel stream. Limit the parallelism according to the context (memory, power).

class BronzeSilverGold {
    int[] values = new int[] {Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE};

    // For reduction
    void add(int x) {
        ...
    }

     // For combining two results of two threads.
    void merge(BronzeSilverGold other) {
        ...
    }
}

The parallelism must be restricted in your constellation, hence specify an N_THREADS in:

try {
    ForkJoinPool threadPool = new ForkJoinPool(N_THREADS);
    threadPool.submit(() -> {
        BronzeSilverGold result = IntStream.of(...).parallel().collect(
            BronzeSilverGold::new,
            (bsg, n) -> BronzeSilverGold::add,
            (bsg1, bsg2) -> bsg1.merge(bsg2));
        ...
    });
} catch (InterruptedException | ExecutionException e) {
    prrtl();
}

Upvotes: 0

Atiq Rahman
Atiq Rahman

Reputation: 678

Here's a solution with O(N + k lg k) complexity.

int[] kLargest_Dremio(int[] A, int k) {
  int[] result = new int[k];
  shouldGetIndex = true;
  int q = AreIndicesValid(0, A.Length - 1) ? RandomizedSelet(0, A.Length-1,
    A.Length-k+1) : -1;
  Array.Copy(A, q, result, 0, k);
  Array.Sort(result, (a, b) => { return a>b; });
  return result;
} 

AreIndicesValid and RandomizedSelet are defined in this github source file.

Upvotes: 0

chauprin
chauprin

Reputation: 31

1) Build a Max Heap tree in O(n)
2) Use Extract Max k times to get k maximum elements from the Max Heap O(klogn)

Time complexity: O(n + klogn)

A C++ implementation using STL is given below:

#include <iostream>
#include<bits/stdc++.h>

using namespace std;

int main() {

  int arr[] = {4,3,7,12,23,1,8,5,9,2}; 

  //Lets extract 3 maximum elements
    int k = 3;  

    //First convert the array to a vector to use STL
    vector<int> vec;
    for(int i=0;i<10;i++){
        vec.push_back(arr[i]);
    }

  //Build heap in O(n)
  make_heap(vec.begin(), vec.end());

  //Extract max k times
  for(int i=0;i<k;i++){
      cout<<vec.front()<<" ";
      pop_heap(vec.begin(),vec.end());
      vec.pop_back();
  }
  return 0;
}

Upvotes: 2

ninhnn
ninhnn

Reputation: 21

Radix sort solution:

  • Sort the array in descending order, using radix sort;
  • Print first K elements.

Time complexity: O(N*L), where L = length of the largest element, can assume L = O(1). Space used: O(N) for radix sort.

However, I think radix sort has costly overhead, making its linear time complexity less attractive.

Upvotes: 2

Raghavan
Raghavan

Reputation: 637

C++ also provides the partial_sort algorithm, which solves the problem of selecting the smallest k elements (sorted), with a time complexity of O(n log k). No algorithm is provided for selecting the greatest k elements since this should be done by inverting the ordering predicate.

For Perl, the module Sort::Key::Top, available from CPAN, provides a set of functions to select the top n elements from a list using several orderings and custom key extraction procedures. Furthermore, the Statistics::CaseResampling module provides a function to calculate quantiles using quickselect.

Python's standard library (since 2.4) includes heapq.nsmallest() and nlargest(), returning sorted lists, the former in O(n + k log n) time, the latter in O(n log k) time.

Upvotes: 1

rburny
rburny

Reputation: 1569

@templatetypedef's solution is probably the fastest one, assuming you can modify or copy input.

Alternatively, you can use heap or BST (set in C++) to store k largest elements at given moment, then read array's elements one by one. While this is O(n lg k), it doesn't modify input and only uses O(k) additional memory. It also works on streams (when you don't know all the data from the beginning).

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 372814

One option would be the following:

  1. Using a linear-time selection algorithm like median-of-medians or introsort, find the kth largest element and rearrange the elements so that all elements from the kth element forward are greater than the kth element.

  2. Sort all elements from the kth forward using a fast sorting algorithm like heapsort or quicksort.

Step (1) takes time O(n), and step (2) takes time O(k log k). Overall, the algorithm runs in time O(n + k log k), which is very, very fast.

Hope this helps!

Upvotes: 13

Related Questions