Baqdul
Baqdul

Reputation: 31

Time complexity of python sorted method inside a loop

def findEquals(words, word):
    wordCounts = sorted(Counter(word).values())
    equals = []

    for word in words:
        wordsCounts = sorted(Counter(word).values())
        if wordCounts == wordsCounts:
            equals.append(word)

    return equals

So I have this loop inside my code where words is a list of words. For each word in the list, I store the frequency of letters in a Counter as values which is then sorted using the method sorted(). I know that sorted() has a worst time complexity of O(nlog(n)). Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?

Upvotes: 3

Views: 3336

Answers (2)

erip
erip

Reputation: 16935

The asymptotic analysis here will be a little weird because as you've written it, it's actually a function of three inputs: the haystack, the length of each word in the haystack, and the needle.

To make it more simple, you can cache the sorted values of the needle: these are static (and this operation will be overwhelmed by the iteration through the haystack).

I've also simplified the code to use filtering, which will abstract out the loop iteration. This won't have a theoretical impact on the performance of the algorithm, but returning an iterator will yield lazy results, which might improve real performance.

Because the frequencies will be integers, they can be sorted in O(n) with respect to the number of frequencies.

Thus, for every element in the haystack, you'll sort and compare with the needle's frequency. This will be O(n*m) where n is the size of haystack and m is the size of each element in haystack.

Therefore, your code can be simplified:

def find_same_frequency_distribution(haystack, needle):
    needle_counts = sorted(Counter(needle).values())
    return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)

>>> def find_same_frequency_distribution(haystack, needle):
...     needle_counts = sorted(Counter(needle).values())
...     return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
...
>>> li = ["daddy", "mummy", "dddya", "foosa"]
>>> for e in find_same_frequency_distribution(li, "babdb"):
...     print(e)
...
daddy
mummy
dddya

Upvotes: 0

Kevin
Kevin

Reputation: 76194

Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?

Not necessarily. It's true that a for loop is O(N), and that sorted is O(N log N), but the "N"s in those expressions refer to different values. The N in O(N) refers to the length of words, and the N in O(N log N) refers to the length of word.

It would only be correct to say that the total complexity of the algorithm is O(N^2 log N), if the average length of each word string is equal to the length of the words list. For instance, if words contained five words, each having five letters. But it is unlikely that this is the case. A more general conclusion to make might be that the algorithm is O(m * n log n), with m corresponding to the size of words, and n corresponding to the average size of a word.

Upvotes: 4

Related Questions