Reputation: 31
def findEquals(words, word):
wordCounts = sorted(Counter(word).values())
equals = []
for word in words:
wordsCounts = sorted(Counter(word).values())
if wordCounts == wordsCounts:
equals.append(word)
return equals
So I have this loop inside my code where words is a list of words. For each word in the list, I store the frequency of letters in a Counter as values which is then sorted using the method sorted(). I know that sorted() has a worst time complexity of O(nlog(n)). Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?
Upvotes: 3
Views: 3336
Reputation: 16935
The asymptotic analysis here will be a little weird because as you've written it, it's actually a function of three inputs: the haystack, the length of each word in the haystack, and the needle.
To make it more simple, you can cache the sorted values of the needle: these are static (and this operation will be overwhelmed by the iteration through the haystack).
I've also simplified the code to use filtering, which will abstract out the loop iteration. This won't have a theoretical impact on the performance of the algorithm, but returning an iterator will yield lazy results, which might improve real performance.
Because the frequencies will be integers, they can be sorted in O(n)
with respect to the number of frequencies.
Thus, for every element in the haystack, you'll sort and compare with the needle's frequency. This will be O(n*m)
where n
is the size of haystack and m
is the size of each element in haystack.
Therefore, your code can be simplified:
def find_same_frequency_distribution(haystack, needle):
needle_counts = sorted(Counter(needle).values())
return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
>>> def find_same_frequency_distribution(haystack, needle):
... needle_counts = sorted(Counter(needle).values())
... return filter(lambda x: sorted(Counter(x).values()) == needle_counts, haystack)
...
>>> li = ["daddy", "mummy", "dddya", "foosa"]
>>> for e in find_same_frequency_distribution(li, "babdb"):
... print(e)
...
daddy
mummy
dddya
Upvotes: 0
Reputation: 76194
Am I right to say that the worst time complexity of the whole code is O(n^2 log(n)) since sorted() is used inside a for loop?
Not necessarily. It's true that a for loop is O(N), and that sorted
is O(N log N), but the "N"s in those expressions refer to different values. The N in O(N) refers to the length of words
, and the N in O(N log N) refers to the length of word
.
It would only be correct to say that the total complexity of the algorithm is O(N^2 log N), if the average length of each word
string is equal to the length of the words
list. For instance, if words
contained five words, each having five letters. But it is unlikely that this is the case. A more general conclusion to make might be that the algorithm is O(m * n log n), with m corresponding to the size of words
, and n corresponding to the average size of a word.
Upvotes: 4