Reputation: 4521
I was solving this leetcode
problem link, and found an amazing solution using heapq
module, the the running time of this function is very less. This is below program:
from itertools import islice
import heapq
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapq.heapify(result)
_heappushpop = heapq.heappushpop
for elem in it:
_heappushpop(result, elem)
result.sort(reverse=True)
return result
print nlargest(5, [10, 122, 2, 3, 3, 4, 5, 5, 10, 12, 23, 18, 17, 15, 100, 101])
This algorithm is really clever and you can also do the visualize here LINK
But I am having a hard time understanding the time complexity of the whole algorithm. Here is my analysis, and please correct me if I am wrong!
Time Complexity :
result = list(islice(it, n)) - > O(n) heapq.heapify(result) -> O(len(result)) for elem in it: _heappushpop(result, elem) -> I am confused at this part result.sort(reverse=True) -> O(len(result)*log(len(result)))
Could anyone help me understand the overall time complexity of the algorithm.
Upvotes: 1
Views: 721
Reputation: 64368
So you have two relevant paramaters here: n
(the number of items to return), and, say, M
(the number of items in the dataset).
islice(it, n) -- O(n)
heapify(result) -- O(n), because len(result)=n
for elem in it: _heappushpop(result, elem) -- performing M-N times an operation of O(logn), because len(result) remains n, i.e. (M-N)*logn
result.sort(reverse=True) -- O(n*logn)
Overall:
n + n + (M-n)*logn + n*logn
Resulting with O(M*logn)
. You can easily see the dominant part is the heappushpop loop (assuming M>>n, otherwise the problem is less interesting, because the solution is more or less reduced to sorting).
It is worth pointing out there are linear-time algorithms for solving this problem, so if your dataset is very big, it is worth checking them out.
Upvotes: 1