Sumanth Sai
Sumanth Sai

Reputation: 1

Which one is better in terms of Time and Space Complexity for checking wheter a string is anagram or not in Python?

Below are my implementations for the same: 1) Using Dictionary by keeping a count:

def anagram(s1,s2):
if len(s1)!=len(s2):
    return "No"
count={}
for let in s1:
    if let not in count:
        count[let]=1
    else:
        count[let]+=1
for let2 in s2:
    if let2 in count:
        count[let2]-=1
    else:
        count[let2]=1
for k in count:
    if count[k]!=0:
        return "No"
return "yes"

2) Using the built-in sorting function and comparing the strings:

def anagram2(s1,s2):
if len(s1)!=len(s2):
    return "no"
s1=str(sorted(s1))
s2=str(sorted(s2))
if s1.lower()==s2.lower():
    return "yes"
else:
    return "No"

3) Using the COunter from collections:

from collections import Counter
def anagram3(s1,s2):
    if len(s1)!=len(s2):
       return "no"
    s1=s1.lower()
    s2=s2.lower()
    if Counter(s1)==Counter(s2):
       return "Yes"
    else:
       return "no"

Which has a better time complexity among the two? And are there any other approaches to make this anagram problem more efficient? if yes please let me know.

Upvotes: 0

Views: 352

Answers (3)

norok2
norok2

Reputation: 26896

There is no approach which is asymptotically more efficient than your first one, which runs O(n). The second approach runs in O(n log n).

The space complexity is O(k) with k < n for the first approach, with k being the number of unique elements, and O(n) for the second approach. Even considering the constant terms (which are generally omitted in asymptotic analysis), the first approach is more efficient since they both get a factor 2.

Within the class of O(n) algorithms, you could add some short-circuiting to improve the performances in some cases, and even write it into a single loop (which allows for some other short-circuiting at the expenses of some others), e.g.:

def is_anagram_1loop(a, b):
    if len(a) != len(b):
        return False
    count = {}
    for x, y in zip(a, b):
        if x == y:  # short-circuit
            continue
        if x not in count:
            count[x] = 1
        else:
            count[x] += 1
        if y not in count:
            count[y] = 1  # cannot short-circuit here
        else:
            count[y] -= 1
    for item, num in count.items():
        if num:
            return False
    return True
def is_anagram_2loops(a, b):
    if len(a) != len(b):
        return False
    count = {}
    for x in a:
        if x not in count:
            count[x] = 1
        else:
            count[x] += 1
    for y in b:
        if y not in count:
            # it should have beed found while looping through `a`
            return False
        else:
            count[y] -= 1
    for item, num in count.items():
        if num:
            return False
    return True

is_anagram_2loops() could have some more short-circuiting, e.g. it could quit as soon as count[y] is about to get negative, e.g.:

def is_anagram_2loops_b(a, b):
    if len(a) != len(b):
        return False
    count = {}
    for x in a:
        if x not in count:
            count[x] = 1
        else:
            count[x] += 1
    for y in b:
        if y not in count:
            # it should have beed found while looping through `a`
            return False
        elif count[y] == 0:
            # it would get negative considering current `y`
            return False
        else:
            count[y] -= 1
    for item, num in count.items():
        if num:
            return False
    return True

However, speed-wise, using collections.Counter() is generally much faster since both loopings are done implicitly:

import collections


def is_anagram_counter(a, b):
    if len(a) == len(b):
        return collections.Counter(a) == collections.Counter(b)
    else:
        return False

The downside of the above could be the larger (roughly twice) memory consumption and that only the len()-based short-circuit could be implemented.

Upvotes: 1

Paddy3118
Paddy3118

Reputation: 4772

The CPython implementation has byte code implementation of what can be calls to much faster underlying C-code. This complicates questions of "What is faster", big-O and complexity. If it is really important then best to generate/aquire test data and measure.

For a general case I would first compare lengths which is quick, then sort the lowercased versions of each string and compare the result, (but only if lengths are the same). Sorting is done in a C routine internally with its own big-O and timing constants.

Something like:

def paddys(s1, s2):
    return len(s1) == len(s2) and sorted(s1.lower()) == sorted(s2.lower())

The and above is what is known as short-circuiting: If the fast len comparison fails then the whole expression returns False without going to the trouble of executing the sorts etc on the RHS.

Run all methods through your test data and measure their actual performance.

Upvotes: 0

Roim
Roim

Reputation: 3066

First implementation is more efficient: it takes only O(s1+s2). In second implementation you use sorted function, which is O(nlogn) (see this: What is the complexity of this python sort method?) so total is O(s1*log(s1)+s2*log(s2))

Edit: And as someone else said in comments, collections.Counter is the easiest way:

from collections import Counter
if Counter(s1) == Counter(s2):
     return "yes"
return "no"

(not sure if you want it case sensitive or not)

Upvotes: 0

Related Questions