Mohd Bilal
Mohd Bilal

Reputation: 101

Calculate word frequencies from text file, But there is an error in my output

Read all the lines from the file and split the lines into words using the split() method. Further, remove punctuation from the ends of words using the strip("""!"#$%&'()*,-./:;?@[]_""") method call

I am very beginner in python and trying to solve some basic problems, I have used split and strip function in the problem given but I am getting the error in frequencies of some words, please review my code.

Python Code:

def word_frequencies(filename="alice.txt"):

    with open(filename) as f:
        string=f.read()

    words=string.split()

    l=[]

    for word in words:
        temp=word.strip("""!"#$%&'()*,-./:;?@[]""")

        if temp:
            l.append(temp)


    string2=[]

    for i in l:
        if i not in string2:
            string2.append(i)

    for j in string2:
        print(f"{j}\t{l.count(j)}")

Output:

The 64 

Project 83

Gutenberg   27

EBook   3

of  303

. . . and so on.

But the actual output is:

The     64

Project 83

Gutenberg   26

EBook   3

of      303

. . . and so on

Upvotes: 0

Views: 85

Answers (1)

Arsegg
Arsegg

Reputation: 126

Use re.findall to split up to words:

from re import findall

words = findall(r"\b\w+\b", text)

, where text is your f.read().

Then count them:

from collections import Counter

c = Counter(words)

Check word count:

for word in ("The", "Project", "Gutenberg", "EBook",):
    print(word, c[word])

Prints:

The 64
Project 83
Gutenberg 83
EBook 3

Upvotes: 1

Related Questions