How to remove duplicates in a document?

Question

I'm writing a words unjumble program. Here are my codes:

import collections

sortedWords = collections.defaultdict(list)

with open("/xxx/xxx/words.txt", "r") as f: 
    for word in f:
        word = word.strip().lower()
        sortFword = ''.join(sorted(word))
        sortedWords[sortFword].append(word)

while True:
    jumble = input("Enter your jumbled word:").lower()
    sortedJumble = ''.join(sorted(jumble))

    if sortedJumble in sortedWords:
        words = sortedWords[sortedJumble]
        if len(words) > 1:
            print ("Your words are: ")
            print ("
".join(words))
        else:
            print ("Your word is", words[0]+".")
        break

    else:
        print ("Oops, it can not be unjumbled.")
        break

Now these code work. However, my program usually print two identical words. For example, I typed "prisng" as jumbled word, then I got two "spring"s. It is because that there were two "spring"s in the word document: one is "spring" and the other one is "Spring". I want to remove all duplicates of the words.txt, but how to remove them? Please give me some advice.

Zach Gates · Accepted Answer

You can use the builtin set function to do this.

words = ['hi', 'Hi']
words = list(map(lambda x: x.lower(), words)) # makes all the words lowercase
words = list(set(words)) # removes all duplicates

How to remove duplicates in a document?

Answers (1)

Related Questions