Delfino
Delfino

Reputation: 1019

Python - Counting Words In A Text File

I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below

import sys

count={}

with open(sys.argv[1],'r') as f:
    for line in f:
        for word in line.split():
            if word not in count:
                count[word] = 1
            else:
                count[word] += 1

print(word,count[word])

file.close()

count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.

I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!

Upvotes: 3

Views: 5462

Answers (4)

Nic Beltrante
Nic Beltrante

Reputation: 1

I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.

import re
#this program get the average number of words per line
def main():
    try:
        #get name of file
        filename=input('Enter a filename:')

        #open the file
        infile=open(filename,'r')

        #read file contents
        contents=infile.read()
        line = len(re.findall(r'\n', contents))
        count = len(re.findall(r'\w+', contents))
        average = count // line

        #display fie contents
        print(contents)
        print('there is an average of', average, 'words per sentence')

        #closse the file
        infile.close()
    except IOError:
        print('An error oocurred when trying to read ')
        print('the file',filename )

#call main
main()

Upvotes: 0

PM 2Ring
PM 2Ring

Reputation: 55499

I just noticed a typo: you open the file as f but you close it as file. As tripleee said, you shouldn't close files that you open in a with statement. Also, it's bad practice to use the names of builtin functions, like file or list, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.

To print the data in your count dict in descending order of count you can do something like this:

items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)

See the Python Library Reference for more details on the list.sort() method and other handy dict methods.

Upvotes: 0

tripleee
tripleee

Reputation: 189936

Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word.

Also, with a with context manager, you don't need to close() the file handle.

Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split.

For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary.

Upvotes: 0

Brian Larsen
Brian Larsen

Reputation: 1756

What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.

import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
    for work in line.strip().split():
        c.update(work)
for ind in c:
    print ind, c[ind]

Upvotes: 3

Related Questions