Reputation: 1019
I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below
import sys
count={}
with open(sys.argv[1],'r') as f:
for line in f:
for word in line.split():
if word not in count:
count[word] = 1
else:
count[word] += 1
print(word,count[word])
file.close()
count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.
I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!
Upvotes: 3
Views: 5462
Reputation: 1
I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.
import re
#this program get the average number of words per line
def main():
try:
#get name of file
filename=input('Enter a filename:')
#open the file
infile=open(filename,'r')
#read file contents
contents=infile.read()
line = len(re.findall(r'\n', contents))
count = len(re.findall(r'\w+', contents))
average = count // line
#display fie contents
print(contents)
print('there is an average of', average, 'words per sentence')
#closse the file
infile.close()
except IOError:
print('An error oocurred when trying to read ')
print('the file',filename )
#call main
main()
Upvotes: 0
Reputation: 55499
I just noticed a typo: you open the file as f
but you close it as file
. As tripleee said, you shouldn't close files that you open in a with
statement. Also, it's bad practice to use the names of builtin functions, like file
or list
, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.
To print the data in your count
dict in descending order of count you can do something like this:
items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)
See the Python Library Reference for more details on the list.sort() method and other handy dict methods.
Upvotes: 0
Reputation: 189936
Your final print
doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word
.
Also, with a with
context manager, you don't need to close()
the file handle.
Finally, as pointed out in a comment, you'll want to remove the final newline from each line
before you split
.
For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict
from Collections
to avoid the special case for initializing a new key in the dictionary.
Upvotes: 0
Reputation: 1756
What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.
import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
for work in line.strip().split():
c.update(work)
for ind in c:
print ind, c[ind]
Upvotes: 3