Reputation: 1
Below are my implementations for the same: 1) Using Dictionary by keeping a count:
def anagram(s1,s2):
if len(s1)!=len(s2):
return "No"
count={}
for let in s1:
if let not in count:
count[let]=1
else:
count[let]+=1
for let2 in s2:
if let2 in count:
count[let2]-=1
else:
count[let2]=1
for k in count:
if count[k]!=0:
return "No"
return "yes"
2) Using the built-in sorting function and comparing the strings:
def anagram2(s1,s2):
if len(s1)!=len(s2):
return "no"
s1=str(sorted(s1))
s2=str(sorted(s2))
if s1.lower()==s2.lower():
return "yes"
else:
return "No"
3) Using the COunter from collections:
from collections import Counter
def anagram3(s1,s2):
if len(s1)!=len(s2):
return "no"
s1=s1.lower()
s2=s2.lower()
if Counter(s1)==Counter(s2):
return "Yes"
else:
return "no"
Which has a better time complexity among the two? And are there any other approaches to make this anagram problem more efficient? if yes please let me know.
Upvotes: 0
Views: 352
Reputation: 26896
There is no approach which is asymptotically more efficient than your first one, which runs O(n)
. The second approach runs in O(n log n)
.
The space complexity is O(k)
with k < n
for the first approach, with k
being the number of unique elements, and O(n)
for the second approach. Even considering the constant terms (which are generally omitted in asymptotic analysis), the first approach is more efficient since they both get a factor 2.
Within the class of O(n)
algorithms, you could add some short-circuiting to improve the performances in some cases, and even write it into a single loop (which allows for some other short-circuiting at the expenses of some others), e.g.:
def is_anagram_1loop(a, b):
if len(a) != len(b):
return False
count = {}
for x, y in zip(a, b):
if x == y: # short-circuit
continue
if x not in count:
count[x] = 1
else:
count[x] += 1
if y not in count:
count[y] = 1 # cannot short-circuit here
else:
count[y] -= 1
for item, num in count.items():
if num:
return False
return True
def is_anagram_2loops(a, b):
if len(a) != len(b):
return False
count = {}
for x in a:
if x not in count:
count[x] = 1
else:
count[x] += 1
for y in b:
if y not in count:
# it should have beed found while looping through `a`
return False
else:
count[y] -= 1
for item, num in count.items():
if num:
return False
return True
is_anagram_2loops()
could have some more short-circuiting, e.g. it could quit as soon as count[y]
is about to get negative, e.g.:
def is_anagram_2loops_b(a, b):
if len(a) != len(b):
return False
count = {}
for x in a:
if x not in count:
count[x] = 1
else:
count[x] += 1
for y in b:
if y not in count:
# it should have beed found while looping through `a`
return False
elif count[y] == 0:
# it would get negative considering current `y`
return False
else:
count[y] -= 1
for item, num in count.items():
if num:
return False
return True
However, speed-wise, using collections.Counter()
is generally much faster since both loopings are done implicitly:
import collections
def is_anagram_counter(a, b):
if len(a) == len(b):
return collections.Counter(a) == collections.Counter(b)
else:
return False
The downside of the above could be the larger (roughly twice) memory consumption and that only the len()
-based short-circuit could be implemented.
Upvotes: 1
Reputation: 4772
The CPython implementation has byte code implementation of what can be calls to much faster underlying C-code. This complicates questions of "What is faster", big-O and complexity. If it is really important then best to generate/aquire test data and measure.
For a general case I would first compare lengths which is quick, then sort the lowercased versions of each string and compare the result, (but only if lengths are the same). Sorting is done in a C routine internally with its own big-O and timing constants.
Something like:
def paddys(s1, s2):
return len(s1) == len(s2) and sorted(s1.lower()) == sorted(s2.lower())
The and
above is what is known as short-circuiting: If the fast len comparison fails then the whole expression returns False without going to the trouble of executing the sorts etc on the RHS.
Run all methods through your test data and measure their actual performance.
Upvotes: 0
Reputation: 3066
First implementation is more efficient: it takes only O(s1+s2)
. In second implementation you use sorted
function, which is O(nlogn)
(see this: What is the complexity of this python sort method?) so total is O(s1*log(s1)+s2*log(s2))
Edit: And as someone else said in comments, collections.Counter
is the easiest way:
from collections import Counter
if Counter(s1) == Counter(s2):
return "yes"
return "no"
(not sure if you want it case sensitive or not)
Upvotes: 0