Time complexity of an optimized anagram function

Question

This is not a homework problem. I'm prepping for an interview, and have done quite a bit of research on links on this post. I coded up a solution based on a suggestion, but I disagree with the time complexity proposed. I'd like to know whether I'm incorrect/correct in my assertion.

Below is a function to spit out group of anagrams. It sorted each input words and put the sorted input word in a dictionary. I wrote the code myself based on a hint from geeksforgeeks posting that suggest:

Using sorting: We can sort array of strings so that all anagrams come together. Then print all anagrams by linearly traversing the sorted array. The time complexity of this solution is O(mnLogn) (We would be doing O(nLogn) comparisons in sorting and a comparison would take O(m) time). Where n is number of strings and m is maximum length of a string.

I disagree with the time complexity mentioned

I think the time complexity for the following code is n(m log m). Space complexity is: O(2n)= O(n) for the results and sorted_dict variable

n= number of words, m = number of character in a word

def groupAnagrams(strs):
  sorted_dict ={}
  results=[]
  for each in strs:#loop: O(n)
     #time complexity for sort: O(m log m). 
     sorted_str = "".join(sorted(each.lower())) #O(m) 
     if  not sorted_dict.get(sorted_str):  #0(1)
         sorted_dict[sorted_str] = []
     sorted_dict[sorted_str].append(each) #0(1)

  for k,v in sorted_dict.items(): #0(n)
     results.append(v)
  return results

kaya3 · Accepted Answer

Your algorithm has time complexity O(mn log m), dominated by the time it takes to sort each of the strings in the array; so your analysis is correct. However, your result differs from the one you quoted not because the quote is wrong, but because your algorithm is different to the one analysed in the quote. Note that the quote says:

We can sort array of strings so that all anagrams come together.

Your algorithm does not do this; it does not sort the array of strings at all, rather it sorts the characters in each string individually. Here's an implementation in Python of the algorithm that this quote is talking about:

from itertools import groupby

NO_OF_CHARS = 256

def char_freqs(word):
    count = [0] * NO_OF_CHARS
    for c in word: 
        count[ord(c)] += 1
    return count

def print_anagrams_together(words):
    words = sorted(words, key=char_freqs)
    for _, group in groupby(words, key=char_freqs):
        print(*group, sep=', ')

The time complexity can be determined as follows:

char_freqs takes O(m) time because of iterating over a string of length m.
Sorting takes O(mn + n log n) time, because the key function takes O(m) time and is called for n strings, and then the strings are sorted in O(n log n) time. The comparisons in the sort are done on lists of length NO_OF_CHARS (a constant), so the comparisons take constant time.
Grouping words together takes O(mn) time because it's dominated by calling char_freqs again n times; this could be improved to O(n) by reusing the already-computed keys from the sort, but this part is dominated by sorting anyway.

That gives an overall time complexity of O(mn + n log n), which is not the same as quoted, but you would get O(mn log n) if the key function char_freqs were called for every comparison, instead of once per element and cached. For example, if you did the sorting in Java using something like:

// assuming that charFreqs returns something comparable
Collections.sort(words, Comparator.comparing(Solution::charFreqs));

Then the comparisons would take O(m) time instead of O(1) time, and the overall time complexity would be O(mn log n). So the quote isn't wrong, it's just talking about a different algorithm to the one you were thinking of, and it assumes a suboptimal implementation of it.

Time complexity of an optimized anagram function

Answers (1)

Related Questions