Heyszan
Heyszan

Reputation: 41

Python word count program from txt file

I'm trying to write a program that counts the 5 most common words in a txt file.

Here is what I have so far:

file = open('alice.txt')
wordcount = {}

for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

for k, v in wordcount.items():
    print (k, v)

The program as it is counts every word in the .txt file.

My question is how to make it so it only counts the 5 most common words in the file so that it displays the words and the word count next to each word.

One catch - I can't use dictionary...whatever that means.

Upvotes: 2

Views: 6044

Answers (3)

Fuji Komalan
Fuji Komalan

Reputation: 2047

File_Name = 'file.txt'

counterDict = {}

with open(File_Name,'r') as fh:
    #Reading  all lines into a list.
    data = fh.readlines()

for line in data:
    # Removing some characters like '.' , ',' 
    # Changing all case into lower. 
    line = line.lower().replace(',','').replace('.','')
    # Splitting all words into list elements.
    words = line.split()
    for word in words:
        # Add the word into counterDict if  it is not present.
        # key should be 1.
        if word not in counterDict:
            counterDict[word] = 1
        #If the word is already in the counterDict, then increase its count by one.
        else:
            counterDict[word] = counterDict[word] + 1    

# The sorting will be based on word count.
# Eg : lambda x = (word,count) = x[0] = word , x[1]=count
sorted_counterDict = sorted(counterDict.items(), reverse=True , key=lambda x : x[1])

#sorted_counterDict[0:5] , print first five.
for key,val in sorted_counterDict[0:5]:
    print(key,val)

Upvotes: 1

Remi Guan
Remi Guan

Reputation: 22282

Easy, you just need to find the 5 most common words in the file.

So you could do something like this:

wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)

And then, this dictionary will be sorted by values(remember that sorted return a list).

You can use the following code to get the 5 most common words:

for k, v in wordcount[:5]):
    print (k, v)

So the full code looks like:

wordcount = {}

with open('alice.txt') as file:  # with can auto close the file
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)

for k, v in wordcount[:5]:
    print(k, v)

Also, here is a more simple way to do this use use collections.Counter:

from collections import Counter
with open('alice.txt') as file:  # with can auto close the file
    wordcount = Counter(file.read().split())

for k, v in wordcount.most_common(5):
    print(k, v)

The output is same as the first solution.

Upvotes: 1

jp_
jp_

Reputation: 238

There is a built-in function that sorts a dictionary by key:

sorted(wordcount, reverse=True)

Now it's up to you to figure out how to get/print only the first five elements ;)

Note: of course sorted is also capable to sort other collections.

Upvotes: 0

Related Questions