SirSnitch
SirSnitch

Reputation: 77

Finding anagrams using dictionary in Python

I'm trying to create a function in python that will print out the anagrams of words in a text file using dictionaries. I've looked at what feels like hundreds of similar questions, so I apologise if this is a repetition, but I can't seem to find a solution that fits my issue.

I understand what I need to do (at least, I think so), but I'm stuck on the final part.

This is what I have so far:

with open('words.txt', 'r') as fp:
    line = fp.readlines()

def make_anagram_dict(line):
    dict = {}
    for word in line:
        key = ''.join(sorted(word.lower()))
        if key in dict.keys():
            dict[key].append(word.lower())
        else:
            dict[key] = []
            dict[key].append(word.lower())
    if line == key:
        print(line)


make_anagram_dict(line)

I think I need something which compares the key of each value to the keys of other values, and then prints if they match, but I can't get something to work.

At the moment, the best I can do is print out all the keys and values in the file, but ideally, I would be able to print all the anagrams from the file.

Output: I don't have a concrete specified output, but something along the lines of: [cat: act, tac]

for each anagram. Again, apologies if this is a repetition, but any help would be greatly appreciated.

Upvotes: 2

Views: 6961

Answers (3)

sal
sal

Reputation: 3593

Your code is pretty much there, just needs some tweaks:

import re

def make_anagram_dict(words):
    d = {}  
    for word in words:
        word = word.lower()          # call lower() only once
        key = ''.join(sorted(word))  # make the key
        if key in d:  # check if it's in dictionary already
            if word not in d[key]:   # avoid duplicates
                d[key].append(word)
        else:
            d[key] = [word]  # initialize list with the initial value
    return d                         # return the entire dictionary

if __name__ == '__main__':
    filename = 'words.txt'
    with open(filename) as file:
        # Use regex to extract words. You can adjust to include/exclude 
        # characters, numbers, punctuation...
        # This returns a list of words
        words = re.findall(r"([a-zA-Z\-]+)", file.read())

    # Now process them 
    d = make_anagram_dict(words)  

    # Now print them
    for words in d.values():
        if len(words) > 1:  # we found anagrams
            print('Anagram group {}: {}'.format(', '.join(words)))

Upvotes: 0

Rob Truxal
Rob Truxal

Reputation: 6408

I'm going to make the assumption you're grouping words within a file which are anagrams of eachother.

If, on the other hand, you're being asked to find all the English-language anagrams for a list of words in a file, you will need a way of determining what is or isn't a word. This means you either need an actual "dictionary" as in a set(<of all english words>) or a maybe a very sophisticated predicate method.

Anyhow, here's a relatively straightforward solution which assumes your words.txt is small enough to be read into memory completely:

with open('words.txt', 'r') as infile:
    words = infile.read().split()

anagram_dict = {word : list() for word in words}

for k, v in anagram_dict.items():
     k_anagrams = (othr for othr in words if (sorted(k) == sorted(othr)) and (k != othr))
     anagram_dict[k].extend(k_anagrams)

print(anagram_dict)

This isn't the most efficient way to do this, but hopefully it gets accross the power of filtering.

Arguably, the most important thing here is the if (sorted(k) == sorted(othr)) and (k != othr) filter in the k_anagrams definition. This is a filter which only allows identical letter-combinations, but weeds out exact matches.

Upvotes: 1

Viacheslav Kroilov
Viacheslav Kroilov

Reputation: 1671

I'm not sure about the output format. In my implementation, all anagrams are printed out in the end.

with open('words.txt', 'r') as fp:
    line = fp.readlines()

def make_anagram_dict(line):
    d = {}  # avoid using 'dict' as variable name

    for word in line:
        word = word.lower()  # call lower() only once
        key = ''.join(sorted(word))
        if key in d:  # no need to call keys()
            d[key].append(word)
        else:
            d[key] = [word]  # you can initialize list with the initial value

    return d  # just return the mapping to process it later

if __name__ == '__main__':
    d = make_anagram_dict(line)

    for words in d.values():
        if len(words) > 1:  # several anagrams in this group
            print('Anagrams: {}'.format(', '.join(words)))

Also, consider using defaultdict - it's a dictionary, that creates values of a specified type for fresh keys.

from collections import defaultdict

with open('words.txt', 'r') as fp:
    line = fp.readlines()

def make_anagram_dict(line):
    d = defaultdict(list)  # argument is the default constructor for value

    for word in line:
        word = word.lower()  # call lower() only once
        key = ''.join(sorted(word))
        d[key].append(word)  # now d[key] is always list

    return d  # just return the mapping to process it later

if __name__ == '__main__':
    d = make_anagram_dict(line)

    for words in d.values():
        if len(words) > 1:  # several anagrams in this group
            print('Anagrams: {}'.format(', '.join(words)))

Upvotes: 3

Related Questions