Estonia_girl
Estonia_girl

Reputation: 83

count word in textfile

I have a textfile that I wanna count the word "quack" in.

textfile named "quacker.txt" example:

This is the textfile quack. Oh, and how quack did quack do in his exams back in 2009?\n Well, he passed with nine P grades and one B.\n He says that quack he wants to go to university in the\n future but decided to try and make a career on YouTube before that Quack....\n So, far, it’s going very quack well Quack!!!!

So here I want 7 as the output.

readf= open("quacker.txt", "r")
lst= []
for x in readf:
  lst.append(str(x).rstrip('\n'))
readf.close()
#above gives a list of each row.
cv=0
for i in lst:
  if "quack" in i.strip():
    cv+=1

above only works for one "quack" in the element of the list

Upvotes: 1

Views: 80

Answers (3)

Padraic Cunningham
Padraic Cunningham

Reputation: 180522

You need to lower, strip and split to get an accurate count:

from string import punctuation
with open("test.txt") as f:
    quacks = sum(word.lower().strip(punctuation) == "quack"
                  for line in f for word in line.split())
    print(quacks)
7

You need to split each word in the file into individual words or you will get False positives using in or count. word.lower().strip(punctuation) lowers each word and removes any punctuation, sum will sum all the times word.lower().strip(punctuation) == "quack" is True.

In your own code x is already a string so calling str(x)... is unnecessary, you could also just check each line the first time you iterate, there is no need to add the strings to a list and then iterate a second time. Why you only get one returned is most like because all the data is actually on a single line, you are also comparing quack to Quack which will not work, you need to lower the string.

Upvotes: 1

Totem
Totem

Reputation: 7369

Well if the file isn't too long, you could try:

with open('quacker.txt') as f:
    text = f.read().lower() # make it all lowercase so the count works below
    quacks = text.count('quack')

As @PadraicCunningham mentioned in the comments, this would also count the 'quack' in words like 'quacks' or 'quacking'. But if that's not an issue, then this is fine.

Upvotes: 2

yurib
yurib

Reputation: 8147

you're incrementing by one if the line contains the string, but what if the line has several occurrences of 'quack'?

try:

for line in lst:
    for word in line.split():
        if 'quack' in word:
            cv+=1

Upvotes: 1

Related Questions