Reputation: 43
I'm trying to count the number of times the word 'the' appears in two books saved as text files. The code I'm running returns zero for each book.
Here's my code:
def word_count(filename):
"""Count specified words in a text"""
try:
with open(filename) as f_obj:
contents = f_obj.readlines()
for line in contents:
word_count = line.lower().count('the')
print (word_count)
except FileNotFoundError:
msg = "Sorry, the file you entered, " + filename + ", could not be found."
print (msg)
dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)
WHat am I doing wrong here?
Upvotes: 1
Views: 162
Reputation: 225
Another way:
with open(filename) as f_obj:
contents = f_obj.read()
print("The word 'the' appears " + str(contents.lower().count('the')) + " times")
Upvotes: 1
Reputation: 19
import os
def word_count(filename):
"""Count specified words in a text"""
if os.path.exists(filename):
if not os.path.isdir(filename):
with open(filename) as f_obj:
print(f_obj.read().lower().count('t'))
else:
print("is path to folder, not to file '%s'" % filename)
else:
print("path not found '%s'" % filename)
Upvotes: 0
Reputation: 30250
Unless the word 'the' appears on the last line of each file, you'll see zeros.
You likely want to initialize the word_count
variable to zero then use augmented addition (+=
):
For example:
def word_count(filename):
"""Count specified words in a text"""
try:
word_count = 0 # <- change #1 here
with open(filename) as f_obj:
contents = f_obj.readlines()
for line in contents:
word_count += line.lower().count('the') # <- change #2 here
print(word_count)
except FileNotFoundError:
msg = "Sorry, the file you entered, " + filename + ", could not be found."
print(msg)
dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)
Augmented addition isn't necessary, just helpful. This line:
word_count += line.lower().count('the')
could be written as
word_count = word_count + line.lower().count('the')
But you also don't need to read the lines all into memory at once. You can iterate over the lines right from the file object. For example:
def word_count(filename):
"""Count specified words in a text"""
try:
word_count = 0
with open(filename) as f_obj:
for line in f_obj: # <- change here
word_count += line.lower().count('the')
print(word_count)
except FileNotFoundError:
msg = "Sorry, the file you entered, " + filename + ", could not be found."
print(msg)
dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'
word_count(dracula)
word_count(siddhartha)
Upvotes: 1
Reputation: 20366
You are re-assigning word_count
for each iteration. That means that at the end it will be the same as the number of occurrences of the
in the last line of the file. You should be getting the sum. Another thing: should there
match? Probably not. You probably want to use line.split()
. Also, you can iterate through a file object directly; no need for .readlines()
. One last, use a generator expression to simplify. My first example is without the generator expression; the second is with it:
def word_count(filename):
with open(filename) as f_obj:
total = 0
for line in f_obj:
total += line.lower().split().count('the')
print(total)
def word_count(filename):
with open(filename) as f_obj:
total = sum(line.lower().split().count('the') for line in f_obj)
print(total)
Upvotes: 3