JayG747
JayG747

Reputation: 21

How to create a dictionary for a text file

My program opens a file and it can word count the words contained in it but i want to create a dictionary consisting of all the unique words in the text for example if the word 'computer' appears three times i want that to count as one unique word

def main():

    file = input('Enter the name of the input file: ')
    infile = open(file, 'r')

    file_contents = infile.read()

    infile.close()

    words = file_contents.split()

    number_of_words = len(words)

    print("There are", number_of_words, "words contained in this paragarph")

main()

Upvotes: 2

Views: 95

Answers (3)

TheBlackCat
TheBlackCat

Reputation: 10328

Use a set. This will only include unique words:

words = set(words)

If you don't care about case, you can do this:

words = set(word.lower() for word in words)

This assumes there is no punctuation. If there is, you will need to strip the punctuation.

import string
words = set(word.lower().strip(string.punctuation) for word in words)

If you need to keep track of how many of each word you have, just replace set with Counter in the examples above:

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)

This will give you a dictionary-like object that tells you how many of each word there is.

You can also get the number of unique words from this (although it is slower if that is all you care about):

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
nword = len(words)   

Upvotes: 2

Ihor Pomaranskyy
Ihor Pomaranskyy

Reputation: 5651

Probably more cleaner and quick solution:

words_dict = {}
for word in words:
    word_count = words_dict.get(word, 0)
    words_dict[word] = word_count + 1

Upvotes: 0

zazga
zazga

Reputation: 366

@TheBlackCat his solution works but only gives you how much unique words are in the string/file. This solution also shows you how many times it occurs.

dictionaryName = {}
for word in words:
    if word not in list(dictionaryName):
        dictionaryName[word] = 1
    else:
        number = dictionaryName.get(word)
        dictionaryName[word] = dictionaryName.get(word) + 1
print dictionaryName

tested with:

words = "Foo", "Bar", "Baz", "Baz"
output: {'Foo': 1, 'Bar': 1, 'Baz': 2}

Upvotes: 0

Related Questions