Reputation: 782
I've created a simple word count program in python which reads a text file, counts the word frequency and writes the result to another file. The problem is when the word gets repeated, the program writes the initial as well as final count of the same word. For Example, if a word "hello" is repeated say 3 times, the program writes 3 instance of hello in output as :
Word - Frequency Count
hello - 1
hello - 2
hello - 3
The code is:
counts ={}
for w in words:
counts[w] = counts.get(w,0) + 1
outfile.write(w+','+str(counts[w])+'\n')'
Any help would be appreciated. I'm very much new in python.
Upvotes: 1
Views: 9574
Reputation: 4167
The way to make the code work:
counts ={}
for w in words:
counts[w] = counts.get(w,0) + 1
for w in counts:
outfile.write(w+','+str(counts[w])+'\n')
But I think Burhan Khalid's suggestion of using Counter is a better way to solve the problem.
Upvotes: 1
Reputation: 174614
The actual way to solve this is to use Counter
, like this:
>>> from collections import Counter
>>> words = ['b','b','the','the','the','c']
>>> Counter(words).most_common()
[('the', 3), ('b', 2), ('c', 1)]
The other way to solve it, is by using a defaultdict
, which will work just like the Counter
example above:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for word in words:
... d[word] += 1
...
>>> d
defaultdict(<type 'int'>, {'the': 3, 'b': 2, 'c': 1})
No matter how you count the words, you can only write to the file once all words are counted; otherwise you are writing once for each "count", and as soon as the word appears more than once, you will have doubled out your output.
So, first collect the counts, then write them out.
Upvotes: 5
Reputation: 2328
Have you considered first storing the frequency count in your program, then writing it all at the end? It would certainly be simpler than rewriting the output file for every count.
Upvotes: 0