Travis
Travis

Reputation: 21

Time complexity of Top K Frequent Elements in Python (Leetcode)

I am trying to understand a more efficient solution to Problem 347 on Leetcode. My solution was not as efficient. I do not understand why this solution runs in O(n) time.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # count up the frequencies of each number: # O(n)
        counts = {}
        for n in nums:
            counts[n] = 1 + counts.get(n, 0) # add one to the count, default if not found is 0

        
        # create our buckets: O(n)
        freq_buckets = for [[] for i in range(len(nums) + 1)]

        # add the counts to each of the buckets
        for num, count in counts.items():
            # for that specific frequency, add this num to the list of 
            # numbers occurring at that frequency
            freq_buckets[count].append(num)

        # hold our k most frequent
        most_frequent = []
        # iterate from most frequent to least frequent
        for i in range(len(freq_buckets) - 1, 0, -1): # O(n): loop 3
            for num in freq_buckets[i]: # O(n): loop 4
                most_frequent.append(num)

                # break when we have k most frequent
                if len(most_frequent) == k:
                    return most_frequent

I believed it ran in O(kn) time for a moment because loop 3 would only run until k. However, on further inspection, I can see there is a balance between loop 3 and loop 4 because if the outer loop runs for n iterations then the inner loop could run between 0 and n + 1 times. This leads me to believe that if the inner loop runs n times then we must be at the bucket of 1 which means we've iterated over the outer loop almost n times and then iterate over the inner loop k times (which could be equal to n if n = k).

There are cases that I'm struggling to reason through. Am I correct in thinking that this algorithm does not run in O(n) time? If not, please explain why.

I have tried analysing the time complexities of each line and expected to find O(n).

I have also read another post here but do not understand how they come to their conclusion.

Upvotes: 1

Views: 252

Answers (1)

Mike Smith
Mike Smith

Reputation: 456

Every iteration of the inner loop will append one element to most_frequent. Once the list most_frequent reaches size k, the function will terminate. In addition, most_frequent does not reset every time the outer loop increments by 1. Thus, the inner loop will run at most k times, total, regardless of how many times the outer loop is running.

A complexity analysis on this gives O(N+K) -- N for the outer loop, and K for the inner loop. Since K <= N, this is equivalent to O(2N) ~ O(N).

The key idea here is that the inner loop is not running k times for every one of the outer loop's iterations. The number of iterations of the inner loop across all iterations of the outer loop is bounded by a constant k.

Upvotes: 0

Related Questions