Reputation: 77
I'm trying to create a function in python that will print out the anagrams of words in a text file using dictionaries. I've looked at what feels like hundreds of similar questions, so I apologise if this is a repetition, but I can't seem to find a solution that fits my issue.
I understand what I need to do (at least, I think so), but I'm stuck on the final part.
This is what I have so far:
with open('words.txt', 'r') as fp:
line = fp.readlines()
def make_anagram_dict(line):
dict = {}
for word in line:
key = ''.join(sorted(word.lower()))
if key in dict.keys():
dict[key].append(word.lower())
else:
dict[key] = []
dict[key].append(word.lower())
if line == key:
print(line)
make_anagram_dict(line)
I think I need something which compares the key of each value to the keys of other values, and then prints if they match, but I can't get something to work.
At the moment, the best I can do is print out all the keys and values in the file, but ideally, I would be able to print all the anagrams from the file.
Output: I don't have a concrete specified output, but something along the lines of: [cat: act, tac]
for each anagram. Again, apologies if this is a repetition, but any help would be greatly appreciated.
Upvotes: 2
Views: 6961
Reputation: 3593
Your code is pretty much there, just needs some tweaks:
import re
def make_anagram_dict(words):
d = {}
for word in words:
word = word.lower() # call lower() only once
key = ''.join(sorted(word)) # make the key
if key in d: # check if it's in dictionary already
if word not in d[key]: # avoid duplicates
d[key].append(word)
else:
d[key] = [word] # initialize list with the initial value
return d # return the entire dictionary
if __name__ == '__main__':
filename = 'words.txt'
with open(filename) as file:
# Use regex to extract words. You can adjust to include/exclude
# characters, numbers, punctuation...
# This returns a list of words
words = re.findall(r"([a-zA-Z\-]+)", file.read())
# Now process them
d = make_anagram_dict(words)
# Now print them
for words in d.values():
if len(words) > 1: # we found anagrams
print('Anagram group {}: {}'.format(', '.join(words)))
Upvotes: 0
Reputation: 6408
I'm going to make the assumption you're grouping words within a file which are anagrams of eachother.
If, on the other hand, you're being asked to find all the English-language anagrams for a list of words in a file, you will need a way of determining what is or isn't a word. This means you either need an actual "dictionary" as in a set(<of all english words>)
or a maybe a very sophisticated predicate method.
Anyhow, here's a relatively straightforward solution which assumes your words.txt
is small enough to be read into memory completely:
with open('words.txt', 'r') as infile:
words = infile.read().split()
anagram_dict = {word : list() for word in words}
for k, v in anagram_dict.items():
k_anagrams = (othr for othr in words if (sorted(k) == sorted(othr)) and (k != othr))
anagram_dict[k].extend(k_anagrams)
print(anagram_dict)
This isn't the most efficient way to do this, but hopefully it gets accross the power of filtering.
Arguably, the most important thing here is the if (sorted(k) == sorted(othr)) and (k != othr)
filter in the k_anagrams
definition. This is a filter which only allows identical letter-combinations, but weeds out exact matches.
Upvotes: 1
Reputation: 1671
I'm not sure about the output format. In my implementation, all anagrams are printed out in the end.
with open('words.txt', 'r') as fp:
line = fp.readlines()
def make_anagram_dict(line):
d = {} # avoid using 'dict' as variable name
for word in line:
word = word.lower() # call lower() only once
key = ''.join(sorted(word))
if key in d: # no need to call keys()
d[key].append(word)
else:
d[key] = [word] # you can initialize list with the initial value
return d # just return the mapping to process it later
if __name__ == '__main__':
d = make_anagram_dict(line)
for words in d.values():
if len(words) > 1: # several anagrams in this group
print('Anagrams: {}'.format(', '.join(words)))
Also, consider using defaultdict
- it's a dictionary, that creates values of a specified type for fresh keys.
from collections import defaultdict
with open('words.txt', 'r') as fp:
line = fp.readlines()
def make_anagram_dict(line):
d = defaultdict(list) # argument is the default constructor for value
for word in line:
word = word.lower() # call lower() only once
key = ''.join(sorted(word))
d[key].append(word) # now d[key] is always list
return d # just return the mapping to process it later
if __name__ == '__main__':
d = make_anagram_dict(line)
for words in d.values():
if len(words) > 1: # several anagrams in this group
print('Anagrams: {}'.format(', '.join(words)))
Upvotes: 3