Chase B
Chase B

Reputation: 31

What is the Time Complexity of finding the max k integers?

def max_k_sort(k, nums):
    # sort nums first using timsort
    # add O(n*log(n)) time complexity
    sorted_nums = sorted(nums)

    return sorted_nums[-1*k:len(nums)]

def max_k(k, nums):
    # build initial max number list
    max_nums = {}

    # add O(k) time complexity?
    i = 0
    while i < k:
        max_nums[i] = 0
        i += 1

    # add O(n) time complexity?
    least_max_key = min(max_nums, key=max_nums.get)
    least_max = max_nums[least_max_key]

    # add O(n) time complexity?
    for n in nums:
        if n > least_max:
            max_nums[least_max_key] = n
            least_max_key = min(max_nums, key=max_nums.get)
            least_max = max_nums[least_max_key]

    return max_nums.values()

print(max_k(5, [2, 8, 4, 9, 0, 12, 12, 6, 5]))

I'm quite unsure of the time complexity of this code. The task is to return the max k numbers from an unsorted integer array. Each number in the array is in the range [0, 10000). My goal is to have one obvious solution max_k_sort(k, nums) that does the task in O(n*log(n)) time complexity and another method max_k(k, nums) that does the task in O(n) time complexity where n is the number of integers passed and k is the number of max values to find. I can't help but wonder if there's a way to return the max values sorted in O(n) time complexity.

Upvotes: 2

Views: 3011

Answers (2)

dawg
dawg

Reputation: 103814

The time complexity of list operations in Python states list sorts are O(N log N).

Slices are O(k)

So:

def max_k(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

O(k) + O(n log n) is O(n log n) where O(k) is less than O(n log n)

>>> max_k(5, [2, 8, 4, 9, 0, 12, 12, 6, 5])
[12, 12, 9, 8, 6]

As a practical matter, try timing them:

import heapq
def max_k1(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

def max_k2(k, nums):
    return heapq.nlargest(k, nums)    

if __name__ == '__main__':
    import timeit
    for f in (max_k1, max_k2):
        li=[2, 8, 4, 9, 0, 12, 12, 6, 5]
        print f.__name__, timeit.timeit('f(5, li)', setup='from __main__ import f, li')  

Prints:

max_k1 0.240165948868
max_k2 4.96488595009

So sort and slice is 20x faster than heapq.


Based on the comment:

import heapq
def max_k1(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

def max_k2(k, nums):
    return heapq.nlargest(k, nums)   

def max_k3(k, nums):
    return sorted(nums, reverse=True)[0:k]    

if __name__ == '__main__':
    import timeit
    for f in (max_k1, max_k2, max_k3):
        li=[2, 8, 4, 9, 0, 12, 12, 6, 5]
        print f.__name__, timeit.timeit('f(5, li)', setup='from __main__ import f, li')    

max_k1 0.242296934128
max_k2 4.52635192871
max_k3 0.332237005234

Upvotes: 0

roippi
roippi

Reputation: 25954

for n in nums:
        if n > least_max:
            max_nums[least_max_key] = n
            least_max_key = min(max_nums, key=max_nums.get) # this is O(k)
            least_max = max_nums[least_max_key]

You're doing an O(k) operation n times, so the complexity of your second function is O(n*k).

Assuming you want the output in sorted order, this can be done most easily in O(n*log(k)) by creating a k-sized heap and pushing everything onto it. This is implemented for you in heapq.nlargest.

import heapq

heapq.nlargest(5, [2, 8, 4, 9, 0, 12, 12, 6, 5])
Out[4]: [12, 12, 9, 8, 6]

If you don't want the output in sorted order, this can technically be done in O(n). There exist algorithms (and python implementations) to find the kth largest element in an array in linear time; it's easy to see that one more pass through the array would allow you to build an array of all numbers k and larger, giving overall O(n).

Upvotes: 9

Related Questions