Angular
Angular

Reputation: 623

Python check for Anagram in O(n) solution

I'm trying to check if 2 strings are anagrams. This solution is simple, but not efficient (Ologn) I know I could use Collections and Counter, then compare the occurrence of each character, but I'm trying to avoid any modules for an interview. What would be the fastest way to solve this problem? (Perhaps, checking occurrence of each character?)

def check(word1,word2):

    return sorted(word1)==sorted(word2)

Upvotes: 2

Views: 929

Answers (3)

Tamas Hegedus
Tamas Hegedus

Reputation: 29906

Your code doesn't even return a correct value. This one-liner is O(n log n):

return sorted(word1) == sorted(word2)

For an O(n) solution, you can count all characters:

from collections import Counter
# ...
def check(a, b)
  return Counter(a) == Counter(b)

Without collections it is much longer:

def check(a, b):
    chars = dict.fromkeys(a + b, 0)
    for c in a:
        chars[c] += 1
    for c in b:
        chars[c] -= 1
    return not any(chars.values())

This code does the following:

  • chars = dict.fromkeys(a + b, 0): Creates a dict, which has all the occurring characters in either word as keys set to 0.
  • for c in a: chars[c] += 1: this will iterate over a and count the occurrences of each character in it. chars now contains the count of separate characters, (and some zeroes for characters in b but not a)
  • for c in b: chars[c] -= 1: much the same as before, but instead this will subtract the character counts of b from chars
  • return not any(chars.values()): chars['h'] == 0 if and only if a and b has the same amount of 'h'. This line checks if chars has only zeroes as values, meaning that all characters have the same count in both inputs. (as any returns if there is any truthy value in the sequence. 0 is falsy, every other integer is truthy.)

Both lists get iterated over once. Assuming O(1) access time for dictionaries makes the whole algorithm run in O(n) time (where n is the total length of the inputs). Space complexity is O(n) too (all characters can be distinct). Don't make that mistake when they ask you complexity. It's not necessary time complexity.

Upvotes: 3

akosel
akosel

Reputation: 1183

Here's a nice option from http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/AnAnagramDetectionExample.html:

def anagramSolution(s1,s2):

    TABLE_SIZE = 128
    c1 = [0]*TABLE_SIZE
    c2 = [0]*TABLE_SIZE

    for ch in s1:
        pos = ord(ch)
        c1[pos] = c1[pos] + 1

    for ch in s2:
        pos = ord(ch)
        c2[pos] = c2[pos] + 1

    j = 0
    stillOK = True
    while j<TABLE_SIZE and stillOK:
        if c1[j]==c2[j]:
            j = j + 1
        else:
            stillOK = False

    return stillOK

This runs in O(n). Essentially, you loop over both strings, counting the occurrences of each letter. In the end, you can simply iterate over each letter, making sure the counts are equal.

As noted in the comments, this will have a harder time scaling for unicode. If you expect unicode, you would likely want to use a dictionary.

Upvotes: 2

timgeb
timgeb

Reputation: 78650

I'd write it like this without imports:

def count_occurences(mystring):
    occs = {}
    for char in mystring:
        if char in occs:
            occs[char] += 1
        else:
            occs[char] = 1
    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)

Or, if you can use imports, just not a Counter, use a defaultdict:

from collections import defaultdict

def count_occurences(mystring):
    occs = defaultdict(int)
    for char in mystring:
        occs[char] += 1

    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)

Upvotes: 0

Related Questions