Reputation: 5261
Suppose I have a list of strings like ["car", "tree", "boy", "girl", "arc"]
etc. I want to find groups of anagrams in that list - in this case, (car, arc)
.
I tried writing code to loop over the list and compare pairs of strings, but how do I account for the fact that the letters can be in a different order?
For the specific case of checking whether a single pair of strings are anagrams of each other, see Checking strings against each other (Anagrams).
Upvotes: 31
Views: 153761
Reputation: 105
NOTE: This is not a new solution but I have added comments and example to help someone understand logic when using dictionaries.
Find anagrams from a given list of strings
["dog", "god", "cat"]
{dgo: ['dog', 'god'], act: ['cat'] }
['dog', 'god']
def is_anagram(list_of_str):
new_list = []
empty_dict ={}
for item in list_of_str:
sorted_item = ''.join(sorted(item))
if sorted_item not in empty_dict:
empty_dict[sorted_item] = [item]
else:
empty_dict[sorted_item].append(item)
# return empty_dict
for k,v in empty_dict.items():
if len(v)>=2:
new_list.append(v)
return new_list
a = ['dog', 'god', 'cat']
print(is_anagram(a))
Upvotes: -1
Reputation: 2162
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
"Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
#Method 1
A = [''.join(sorted(word)) for word in words]
dict ={}
for indexofsamewords,samewords in enumerate(A):
dict.setdefault(samewords, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 9, 11], 'ABTU': [1, 3, 4], 'Sadioptu': [6, 14], ' KPaaehiklry': [7], 'Taeggllnouy': [8], 'Leov': [10], 'Paiijorty': [12, 16], 'Paaaikpr': [13], 'Saaaabhmryz': [15], ' CNaachlortttu': [17], 'Saaaaborvz': [18]}
for index in dict.values():
print( [words[i] for i in index ] )
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
Upvotes: 0
Reputation: 11
def all_anagrams(words: [str]) -> [str]:
word_dict = {}
for word in words:
sorted_word = "".join(sorted(word))
if sorted_word in word_dict:
word_dict[sorted_word].append(word)
else:
word_dict[sorted_word] = [word]
return list(word_dict.values())
Upvotes: 1
Reputation: 633
Simply use the Counter method available in Python3 collections package.
str1="abc"
str2="cab"
Counter(str1)==Counter(str2)
# returns True i.e both Strings are anagrams of each other.
Upvotes: 1
Reputation: 3683
Since you can't import anything, here are two different approaches including the for loop you asked for.
Approach 1: For Loops and Inbuilt Sorted Function
word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]
# initialize a list
anagram_list = []
for word_1 in word_list:
for word_2 in word_list:
if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
anagram_list.append(word_1)
print(anagram_list)
Approach 2: Dictionaries
def freq(word):
freq_dict = {}
for char in word:
freq_dict[char] = freq_dict.get(char, 0) + 1
return freq_dict
# initialize a list
anagram_list = []
for word_1 in word_list:
for word_2 in word_list:
if word_1 != word_2 and (freq(word_1) == freq(word_2)):
anagram_list.append(word_1)
print(anagram_list)
If you want these approaches explained in more detail, here is an article.
Upvotes: 7
Reputation: 389
This works fine:
def find_ana(l):
a=[]
for i in range(len(l)):
for j in range(len(l)):
if (l[i]!=l[j]) and (sorted(l[i])==sorted(l[j])):
a.append(l[i])
a.append(l[j])
return list(set(a))
Upvotes: 0
Reputation: 22392
Most of previous answers are correct, here is another way to compare two strings. The main benefit of using this strategy versus sort is space/time complexity which is n log of n.
1.Check the length of string
2.Build frequency Dictionary and compare if they both match then we have successfully identified anagram words
def char_frequency(word):
frequency = {}
for char in word:
#if character is in frequency then increment the value
if char in frequency:
frequency[char] += 1
#else add character and set it to 1
else:
frequency[char] = 1
return frequency
a_word ='google'
b_word ='ooggle'
#check length of the words
if (len(a_word) != len(b_word)):
print ("not anagram")
else:
#here we check the frequecy to see if we get the same
if ( char_frequency(a_word) == char_frequency(b_word)):
print("found anagram")
else:
print("no anagram")
Upvotes: 1
Reputation: 449
This one is gonna help you:
Assuming input is given as comma separated strings
console input: abc,bac,car,rac,pqr,acb,acr,abc
in_list = list()
in_list = map(str, raw_input("Enter strings seperated by comma").split(','))
list_anagram = list()
for i in range(0, len(in_list) - 1):
if sorted(in_list[i]) not in list_anagram:
for j in range(i + 1, len(in_list)):
isanagram = (sorted(in_list[i]) == sorted(in_list[j]))
if isanagram:
list_anagram.append(sorted(in_list[i]))
print in_list[i], 'isanagram'
break
Upvotes: 0
Reputation: 1
Simple Solution in Python:
def anagram(s1,s2):
# Remove spaces and lowercase letters
s1 = s1.replace(' ','').lower()
s2 = s2.replace(' ','').lower()
# Return sorted match.
return sorted(s1) == sorted(s2)
Upvotes: 0
Reputation: 11
def findanagranfromlistofwords(li):
dict = {}
index=0
for i in range(0,len(li)):
originalfirst = li[index]
sortedfirst = ''.join(sorted(str(li[index])))
for j in range(index+1,len(li)):
next = ''.join(sorted(str(li[j])))
print next
if sortedfirst == next:
dict.update({originalfirst:li[j]})
print "dict = ",dict
index+=1
print dict
findanagranfromlistofwords(["car", "tree", "boy", "girl", "arc"])
Upvotes: 1
Reputation: 4567
There are multiple solutions to this problem:
Classic approach
First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for collections.Counter
data structure (see docs). The algorithms is as follows:
Here is the code:
from collections import Counter, defaultdict
def anagram(words):
anagrams = defaultdict(list)
for word in words:
histogram = tuple(Counter(word).items()) # build a hashable histogram
anagrams[histogram].append(word)
return list(anagrams.values())
keywords = ("hi", "hello", "bye", "helol", "abc", "cab",
"bac", "silenced", "licensed", "declines")
print(anagram(keywords))
Note that constructing Counter
is O(l)
, while sorting each word is O(n*log(l))
where l is the length of the word.
Solving anagrams using prime numbers
This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.
Upvotes: 12
Reputation: 7913
import collections
def find_anagrams(x):
anagrams = [''.join(sorted(list(i))) for i in x]
anagrams_counts = [item for item, count in collections.Counter(anagrams).items() if count > 1]
return [i for i in x if ''.join(sorted(list(i))) in anagrams_counts]
Upvotes: 0
Reputation: 199
I'm using a dictionary to store each character of string one by one. Then iterate through second string and find the character in the dictionary, if it's present decrease the count of the corresponding key from dictionary.
class Anagram:
dict = {}
def __init__(self):
Anagram.dict = {}
def is_anagram(self,s1, s2):
print '***** starting *****'
print '***** convert input strings to lowercase'
s1 = s1.lower()
s2 = s2.lower()
for i in s1:
if i not in Anagram.dict:
Anagram.dict[i] = 1
else:
Anagram.dict[i] += 1
print Anagram.dict
for i in s2:
if i not in Anagram.dict:
return false
else:
Anagram.dict[i] -= 1
print Anagram.dict
for i in Anagram.dict.keys():
if Anagram.dict.get(i) == 0:
del Anagram.dict[i]
if len(Anagram.dict) == 0:
print Anagram.dict
return True
else:
return False
Upvotes: 0
Reputation: 295
here is the impressive solution.
funct alphabet_count_mapper:
for each word in the file/list
1.create a dictionary of alphabets/characters with initial count as 0.
2.keep count of all the alphabets in the word and increment the count in the above alphabet dict.
3.create alphabet count dict and return the tuple of the values of alphabet dict.
funct anagram_counter:
1.create a dictionary with alphabet count tuple as key and the count of the number of occurences against it.
2.iterate over the above dict and if the value > 1, add the value to the anagram count.
import sys
words_count_map_dict = {}
fobj = open(sys.argv[1],"r")
words = fobj.read().split('\n')[:-1]
def alphabet_count_mapper(word):
alpha_count_dict = dict(zip('abcdefghijklmnopqrstuvwxyz',[0]*26))
for alpha in word:
if alpha in alpha_count_dict.keys():
alpha_count_dict[alpha] += 1
else:
alpha_count_dict.update(dict(alpha=0))
return tuple(alpha_count_dict.values())
def anagram_counter(words):
anagram_count = 0
for word in words:
temp_mapper = alphabet_count_mapper(word)
if temp_mapper in words_count_map_dict.keys():
words_count_map_dict[temp_mapper] += 1
else:
words_count_map_dict.update({temp_mapper:1})
for val in words_count_map_dict.values():
if val > 1:
anagram_count += val
return anagram_count
print anagram_counter(words)
run it with file path as command line argument
Upvotes: -3
Reputation: 755
Solution in python can be as below:
class Word:
def __init__(self, data, index):
self.data = data
self.index = index
def printAnagrams(arr):
dupArray = []
size = len(arr)
for i in range(size):
dupArray.append(Word(arr[i], i))
for i in range(size):
dupArray[i].data = ''.join(sorted(dupArray[i].data))
dupArray = sorted(dupArray, key=lambda x: x.data)
for i in range(size):
print arr[dupArray[i].index]
def main():
arr = ["dog", "act", "cat", "god", "tac"]
printAnagrams(arr)
if __name__== '__main__':
main()
The time complexity of above is O(NMLogN + NMLogM) = O(NMlogN)
Upvotes: 0
Reputation: 1034
A set is an appropriate data structure for the output, since you presumably don't want redundancy in the output. A dictionary is ideal for looking up if a particular sequence of letters has been previously observed, and what word it originally came from. Taking advantage of the fact that we can add the same item to a set multiple times without expanding the set lets us get away with one for loop.
def return_anagrams(word_list):
d = {}
out = set()
for word in word_list:
s = ''.join(sorted(word))
try:
out.add(d[s])
out.add(word)
except:
d[s] = word
return out
A faster way of doing it takes advantage of the commutative property of addition:
import numpy as np
def vector_anagram(l):
d, out = dict(), set()
for word in l:
s = np.zeros(26, dtype=int)
for c in word:
s[ord(c)-97] += 1
s = tuple(s)
try:
out.add(d[s])
out.add(word)
except:
d[s] = word
return out
Upvotes: -1
Reputation: 142
If you want a solution in java,
public List<String> findAnagrams(List<String> dictionary) {
// TODO do null check and other basic validations.
Map<String, List<String>> wordMap = new HashMap<String, List<String>>();
for(String word : dictionary) {
// ignore if word is null
char[] tempWord = word.tocharArray();
Arrays.sort(tempWord);
String newWord = new String(tempWord);
if(wordMap.containsKey(newWord)) {
wordMap.put(newWord, wordMap.get(word).add(word));
} else {
wordMap.put(newWord, new ArrayList<>() {word});
}
}
List<String> anagrams = new ArrayList<>();
for(String key : wordMap.keySet()) {
if(wordMap.get(key).size() > 1) {
anagrams.addAll(wordMap.get(key));
}
}
return anagrams;
}
Upvotes: -7
Reputation: 333
You convert each of the character in a word into a number (by ord() function), add them up for the word. If two words have the same sum, then they are anagrams. Then filter for the sums that occur more than twice in the list.
def sumLet(w):
return sum([ord(c) for c in w])
def find_anagrams(l):
num_l = map(sumLet,l)
return [l[i] for i,num in enumerate(num_l) if num_l.count(num) > 1]
Upvotes: -4
Reputation: 49053
Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.
from collections import defaultdict
def load_words(filename='/usr/share/dict/american-english'):
with open(filename) as f:
for word in f:
yield word.rstrip()
def get_anagrams(source):
d = defaultdict(list)
for word in source:
key = "".join(sorted(word))
d[key].append(word)
return d
def print_anagrams(word_source):
d = get_anagrams(word_source)
for key, anagrams in d.iteritems():
if len(anagrams) > 1:
print(key, anagrams)
word_source = load_words()
print_anagrams(word_source)
Or:
word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)
Upvotes: 18
Reputation: 20429
>>> words = ["car", "race", "rac", "ecar", "me", "em"]
>>> anagrams = {}
... for word in words:
... reverse_word=word[::-1]
... if reverse_word in words:
... anagrams[word] = (words.pop(words.index(reverse_word)))
>>> anagrams
20: {'car': 'rac', 'me': 'em', 'race': 'ecar'}
Logic:
Upvotes: -6
Reputation: 6208
One solution is to sort the word you're searching anagrams for (for example using sorted
), sort the alternative and compare those.
So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca']
, your code could look like this:
word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']
for alt in alternatives:
if word == sorted(alt):
print alt
Upvotes: 9
Reputation: 8047
In order to do this for 2 strings you can do this:
def isAnagram(str1, str2):
str1_list = list(str1)
str1_list.sort()
str2_list = list(str2)
str2_list.sort()
return (str1_list == str2_list)
As for the iteration on the list, it is pretty straight forward
Upvotes: 34
Reputation: 6806
Sort each element then look for duplicates. There's a built-in function for sorting so you do not need to import anything
Upvotes: 4