David Agabi
David Agabi

Reputation: 43

using count method to count a certain word in text file

I'm trying to count the number of times the word 'the' appears in two books saved as text files. The code I'm running returns zero for each book.

Here's my code:

def word_count(filename):
    """Count specified words in a text"""
    try:
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count = line.lower().count('the')
            print (word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print (msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash   Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

WHat am I doing wrong here?

Upvotes: 1

Views: 162

Answers (4)

misterrodger
misterrodger

Reputation: 225

Another way:

with open(filename) as f_obj:
    contents = f_obj.read()
    print("The word 'the' appears " + str(contents.lower().count('the')) + " times")

Upvotes: 1

andreytata
andreytata

Reputation: 19

import os
def word_count(filename):
    """Count specified words in a text"""
    if os.path.exists(filename):
        if not os.path.isdir(filename):
            with open(filename) as f_obj:
                print(f_obj.read().lower().count('t'))
        else:
            print("is path to folder, not to file '%s'" % filename)
    else:
        print("path not found '%s'" % filename)

Upvotes: 0

jedwards
jedwards

Reputation: 30250

Unless the word 'the' appears on the last line of each file, you'll see zeros.

You likely want to initialize the word_count variable to zero then use augmented addition (+=):

For example:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0                                       # <- change #1 here
        with open(filename) as f_obj:
            contents = f_obj.readlines()
            for line in contents:
                word_count += line.lower().count('the')      # <- change #2 here
            print(word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
    print(msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash   Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

Augmented addition isn't necessary, just helpful. This line:

word_count += line.lower().count('the')

could be written as

word_count = word_count + line.lower().count('the')

But you also don't need to read the lines all into memory at once. You can iterate over the lines right from the file object. For example:

def word_count(filename):
    """Count specified words in a text"""
    try:
        word_count = 0
        with open(filename) as f_obj:
            for line in f_obj:                     # <- change here
                word_count += line.lower().count('the')
        print(word_count)

    except FileNotFoundError:
        msg = "Sorry, the file you entered, " + filename + ", could not be     found."
        print(msg)

dracula = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\dracula.txt'
siddhartha = 'C:\\Users\\HP\\Desktop\\Programming\\Python\\Python Crash Course\\TEXT files\\siddhartha.txt'

word_count(dracula)
word_count(siddhartha)

Upvotes: 1

zondo
zondo

Reputation: 20366

You are re-assigning word_count for each iteration. That means that at the end it will be the same as the number of occurrences of the in the last line of the file. You should be getting the sum. Another thing: should there match? Probably not. You probably want to use line.split(). Also, you can iterate through a file object directly; no need for .readlines(). One last, use a generator expression to simplify. My first example is without the generator expression; the second is with it:

def word_count(filename):
    with open(filename) as f_obj:
        total = 0
        for line in f_obj:
            total += line.lower().split().count('the')
        print(total)
def word_count(filename):
    with open(filename) as f_obj:
        total = sum(line.lower().split().count('the') for line in f_obj)
        print(total)

Upvotes: 3

Related Questions