Reputation: 11
So I'm a total newbie to Java and coding in general, having just learnt Big-O as well. I came across this on the internet yesterday (http://www.dsalgo.com/2013/02/MaxKUsingMinHeap.php.html), and would like to know if the complexity analysis [O(n log k)] the code below is correct. Does it also include the worst case scenario? I'd really appreciate if someone could go through this and explain.
import java.util.PriorityQueue;
public class MaxKUsingMinHeap {
public static void main(String[] args) {
int[] arr = {
3, 46, 2, 56, 3, 38, 93, 45, 6, 787, 34, 76, 44, 6, 7, 86, 8, 44, 56
};
int[] result = getTopElements(arr, 5);
for (int i : result) {
System.out.print(i + ",");
}
}
public static int[] getTopElements(int[] arr, int k) {
PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>();
for (int i = 0; i < arr.length; ++i) {
int currentNum = arr[i];
if (minHeap.size() < k) {
minHeap.add(currentNum);
}
else if (currentNum > minHeap.peek())
{
minHeap.poll();
minHeap.add(currentNum);
}
}
int[] result = new int[minHeap.size()];
int index = 0;
while (!minHeap.isEmpty()) {
result[index++] = minHeap.poll();
}
return result;
}
}
Upvotes: 0
Views: 192
Reputation: 181824
The asymptotic complexity details of the program you presented depend on the details of the PriorityQueue
implementation, and those details are not documented. Suppose, however, that the implementation is optimal for number of operations performed by each method (in both average- and worst-case):
(constructor) O(1) size() O(1) add() O(log(k)) peek() O(1) poll() O(log(k)) isEmpty() O(1)
where k
is the number of elements currently in the queue. In particular, these are the characteristics of a queue backed by a "heap" data structure, which variable naming appears to assume to be the implementation (and a very reasonable assumption it is).
Now consider method getTopElements(int[] arr, int k)
, and let n
be arr.length
. The method:
PriorityQueue
, in O(1)
operationsn
elements of arr
, incurring the following costs at each iteration:
O(1)
operationssize()
and peek()
methods and on individual comparisons, so it requires O(1)
operations.O(log k)
because the number of elements in the queue is prevented from exceeding k
. If an element is first removed then the cost of the of the removal is also O(log k)
operations. Since O(log k) + O(log k) = O(log k)
, the asymptotic complexity is not increased by the sometime need to first remove an element.O(1)
operations.min(n, k)
times over the queue (once for each element), each time removing an element at a cost bounded by O(log k)
.For the first loop, the cost is bounded by n * (O(1) + O(1) + O(log k)) = n * O(log k) = O(n log k)
. For the second loop, it is bounded by min(n, k) * O(log k) = O(n log k)
(we could also reduce it to O(k log k)
, because big-O is an upper bound; O(min(n, k))
is O(n)
and O(k)
). Overall, the method therefore requires O(n log k) + O(n log k) = O(n log k)
operations.
In addition to that operation, the main method allocates an n
-element array (O(1)
), initializes its members (O(n)
), and iterates over the results, printing each (O(k)
). None of those costs exceeds that of the getTopElements()
method, so that method's cost dominates the overall cost.
Upvotes: 0
Reputation: 198481
Yes, that code will never take longer than O(n log k) no matter what, because priority queue operations take O(log k) each and you're doing at most O(n) of them.
Upvotes: 2