sulav_lfc
sulav_lfc

Reputation: 782

Python word frequency count program

I've created a simple word count program in python which reads a text file, counts the word frequency and writes the result to another file. The problem is when the word gets repeated, the program writes the initial as well as final count of the same word. For Example, if a word "hello" is repeated say 3 times, the program writes 3 instance of hello in output as :

Word - Frequency Count

hello - 1

hello - 2

hello - 3

The code is:

counts ={}
for w in words:
 counts[w] = counts.get(w,0) + 1
 outfile.write(w+','+str(counts[w])+'\n')'

Any help would be appreciated. I'm very much new in python.

Upvotes: 1

Views: 9574

Answers (3)

towr
towr

Reputation: 4167

The way to make the code work:

counts ={}
for w in words:
    counts[w] = counts.get(w,0) + 1

for w in counts:
    outfile.write(w+','+str(counts[w])+'\n')

But I think Burhan Khalid's suggestion of using Counter is a better way to solve the problem.

Upvotes: 1

Burhan Khalid
Burhan Khalid

Reputation: 174614

The actual way to solve this is to use Counter, like this:

>>> from collections import Counter
>>> words = ['b','b','the','the','the','c']
>>> Counter(words).most_common()
[('the', 3), ('b', 2), ('c', 1)]

The other way to solve it, is by using a defaultdict, which will work just like the Counter example above:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for word in words:
...    d[word] += 1
...
>>> d
defaultdict(<type 'int'>, {'the': 3, 'b': 2, 'c': 1})

No matter how you count the words, you can only write to the file once all words are counted; otherwise you are writing once for each "count", and as soon as the word appears more than once, you will have doubled out your output.

So, first collect the counts, then write them out.

Upvotes: 5

itsmichaelwang
itsmichaelwang

Reputation: 2328

Have you considered first storing the frequency count in your program, then writing it all at the end? It would certainly be simpler than rewriting the output file for every count.

Upvotes: 0

Related Questions