Overall complexity of finding Kth Largest element in Python

Question

I was solving this leetcode problem link, and found an amazing solution using heapq module, the the running time of this function is very less. This is below program:

from itertools import islice
import heapq

def nlargest(n, iterable):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, reverse=True)[:n]
    """
    if n < 0:
        return []
    it = iter(iterable)
    result = list(islice(it, n))
    if not result:
        return result
    heapq.heapify(result)
    _heappushpop = heapq.heappushpop
    for elem in it:
        _heappushpop(result, elem)
    result.sort(reverse=True)
    return result

print nlargest(5, [10, 122, 2, 3, 3, 4, 5, 5, 10, 12, 23, 18, 17, 15, 100, 101])

This algorithm is really clever and you can also do the visualize here LINK

But I am having a hard time understanding the time complexity of the whole algorithm. Here is my analysis, and please correct me if I am wrong!

Time Complexity :

result = list(islice(it, n)) - > O(n)

heapq.heapify(result) -> O(len(result)) 

for elem in it:
        _heappushpop(result, elem)  -> I am confused at this part

result.sort(reverse=True) -> O(len(result)*log(len(result)))

Could anyone help me understand the overall time complexity of the algorithm.

shx2 · Accepted Answer

So you have two relevant paramaters here: n (the number of items to return), and, say, M (the number of items in the dataset).

islice(it, n) -- O(n)
heapify(result) -- O(n), because len(result)=n
for elem in it: _heappushpop(result, elem) -- performing M-N times an operation of O(logn), because len(result) remains n, i.e. (M-N)*logn
result.sort(reverse=True) -- O(n*logn)

Overall:

n + n + (M-n)*logn + n*logn

Resulting with O(M*logn). You can easily see the dominant part is the heappushpop loop (assuming M>>n, otherwise the problem is less interesting, because the solution is more or less reduced to sorting).

It is worth pointing out there are linear-time algorithms for solving this problem, so if your dataset is very big, it is worth checking them out.

Overall complexity of finding Kth Largest element in Python

Answers (1)

Related Questions