Reputation: 623
I'm trying to check if 2 strings are anagrams. This solution is simple, but not efficient (Ologn) I know I could use Collections and Counter, then compare the occurrence of each character, but I'm trying to avoid any modules for an interview. What would be the fastest way to solve this problem? (Perhaps, checking occurrence of each character?)
def check(word1,word2):
return sorted(word1)==sorted(word2)
Upvotes: 2
Views: 929
Reputation: 29906
Your code doesn't even return a correct value. This one-liner is O(n log n):
return sorted(word1) == sorted(word2)
For an O(n) solution, you can count all characters:
from collections import Counter
# ...
def check(a, b)
return Counter(a) == Counter(b)
Without collections it is much longer:
def check(a, b):
chars = dict.fromkeys(a + b, 0)
for c in a:
chars[c] += 1
for c in b:
chars[c] -= 1
return not any(chars.values())
This code does the following:
chars = dict.fromkeys(a + b, 0)
: Creates a dict, which has all the occurring characters in either word as keys set to 0.for c in a: chars[c] += 1
: this will iterate over a
and count the occurrences of each character in it. chars
now contains the count of separate characters, (and some zeroes for characters in b but not a)for c in b: chars[c] -= 1
: much the same as before, but instead this will subtract the character counts of b
from chars
return not any(chars.values())
: chars['h'] == 0
if and only if a
and b
has the same amount of 'h'
. This line checks if chars
has only zeroes as values, meaning that all characters have the same count in both inputs. (as any
returns if there is any truthy value in the sequence. 0 is falsy, every other integer is truthy.)Both lists get iterated over once. Assuming O(1) access time for dictionaries makes the whole algorithm run in O(n) time (where n is the total length of the inputs). Space complexity is O(n) too (all characters can be distinct). Don't make that mistake when they ask you complexity. It's not necessary time complexity.
Upvotes: 3
Reputation: 1183
Here's a nice option from http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/AnAnagramDetectionExample.html:
def anagramSolution(s1,s2):
TABLE_SIZE = 128
c1 = [0]*TABLE_SIZE
c2 = [0]*TABLE_SIZE
for ch in s1:
pos = ord(ch)
c1[pos] = c1[pos] + 1
for ch in s2:
pos = ord(ch)
c2[pos] = c2[pos] + 1
j = 0
stillOK = True
while j<TABLE_SIZE and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
This runs in O(n). Essentially, you loop over both strings, counting the occurrences of each letter. In the end, you can simply iterate over each letter, making sure the counts are equal.
As noted in the comments, this will have a harder time scaling for unicode. If you expect unicode, you would likely want to use a dictionary.
Upvotes: 2
Reputation: 78650
I'd write it like this without imports:
def count_occurences(mystring):
occs = {}
for char in mystring:
if char in occs:
occs[char] += 1
else:
occs[char] = 1
return occs
def is_anagram(str1, str2):
return count_occurences(str1) == count_occurences(str2)
Or, if you can use imports, just not a Counter
, use a defaultdict
:
from collections import defaultdict
def count_occurences(mystring):
occs = defaultdict(int)
for char in mystring:
occs[char] += 1
return occs
def is_anagram(str1, str2):
return count_occurences(str1) == count_occurences(str2)
Upvotes: 0