user11804659
user11804659

Reputation:

Check anagrams within list using dictionaries

I have a word list and need to find all the anagrams within the word list.

I have already tried to create the function/dictionary however I run into memory issues

words = word_list
sort_words = []
anagrams = {}
for word in words:
    word.split()
    word = ''.join(sorted(word))
    sort_words.append(word)

for i in range(len(sort_words)):
    word_anagram = []
    for j in range(len(sort_words)):
        if i == j:
            continue
        if sort_words[i] == sort_words[j]:
            word_anagram.append(words[j])
    anagrams[words[i]] = word_anagram 
print(anagrams)

Is there a different key I should use to get rid of the memory error? I think this will hep as there will be repeats in the anagrams that the function find. If so, what key should I use?

Upvotes: 0

Views: 156

Answers (2)

inspectorG4dget
inspectorG4dget

Reputation: 113905

words = word_list
answer = {}
for word in words:
    answer.setdefault(''.join(sorted(word)), []).append(word)

Each word in word_list is sorted, and associated with that as the key.

Therefore, anagrams are associated by the sorted letters, and all anagrams appear in a list together.

This has linear space complexity, so you shouldn't run out of memory.

Upvotes: 2

dildeolupbiten
dildeolupbiten

Reputation: 1342

Suppose your word is "0123456789". As you can see the length of the string is 10. If you want to create anagrams like below:

0123456789
1023456789
1203456789

which the items are used only one time, you would get 10! possible anagrams. 10! equals to 3628800. Each word has 10 digits which means 10 bytes. 10! * 10 = 36288000 bytes. That means 36.288 MB. I am ignoring the "\n" escape sequence in this case. If we need to consider this escape sequence, You would need 10! * 11 bytes which is equal to 39.9168 MB.

If the length of the word becomes 11, you would need 439084800 bytes (11! * 11 = 439084800). And that equals to 439.0848 MB. If we need to consider the "\n" escape sequence, this time you would need 479.0016 MB.

If the length of the word becomes 12, you would need 5748019200 bytes (12! * 12 = 5748019200). And that equals to 5.7480192 GB. If we need to consider the "\n" escape sequence, this time you would need 6.2270208 GB.

So when making such calculations, you need to know whether the size of the data to be produced can be done by your hardware.

Upvotes: 0

Related Questions