Reputation: 134
I've heard the following problem in interviews twice now:
Given a long list of strings, str_list, and a target string, return True if the target string is an anagram of any of the words in str_list
(Assume multiple queries are possible, so we don't want a solution with early-returns or w/e)
I've implemented the following solution twice, which I believe to be O(NM) where N is the length of str_list and M is the length of the longest word in str_list.
def is_anagram(str_list,target):
anagrams = set()
#O(N)
for s in str_list:
#O(M)
anagrams.add(frozenset(Counter(string).items()))
#O(len(target), but we could just check length vs. longest str in str_list, so O(M))
return frozenset(Counter(target).items()) in anagrams
##Total = O(N*M)
Each time they've told me that it was decent, but that I could do better with the following version, using sorting:
def sorted_is_anagram(str_list,target):
#O(N)
for i,string in enumerate(str_list):
#MLog(M)
str_list[i] = "".join(sorted(string))
#O(M*NLogN)
str_list = set(sorted(str_list))
#O(M(Log(M)))
return "".join(sorted(target)) in str_list
##Total = O(N*MLog(M) + M*NLogN)
Am I crazy or is their version simply not better? What have I messed up?
Thank you!!
Upvotes: 0
Views: 192
Reputation: 134005
It's possible that you're misunderstanding the question. Typically, the question involves being given a large set of strings one time, and then there are multiple queries against that set of strings. So, for example, you might be given a list of 250,000 English words as your string set. Then, there will be many millions of queries against that list.
The solution involves two parts. First, you create a dictionary that has as its key the sorted anagram. The value is the list of words that can be made up of those letters. So, given the words "list" and "silt", you'd have:
{key="ilst", value={"list","silt"}}
You save the resulting dictionary so that it can be used across calls.
Now, when somebody makes a query, you sort the letters in the given word, look it up in the dictionary, and respond with the anagrams.
If you're given N words whose average length is M, then building the dictionary is O(N*(M log M)). But that's a one-time cost.
Each query requires you to sort the letters in the passed word, which, again, is O(M log M). But the dictionary lookup is O(1). So the cost for K queries is O(K * (M log M)).
Upvotes: 1