George Willcox
George Willcox

Reputation: 673

How to check how many times a word appears in a text file

I've seen a few people ask how this would be done, but their questions were 'too broad' so I decided to find out how to do it. I've posted below how.

Upvotes: 0

Views: 5738

Answers (5)

magicandrei
magicandrei

Reputation: 11

def words_frequency_counter(filename):
    """Print how many times the word appears in the text."""
    try:
        with open(filename) as file_object:
            contents = file_object.read()
    except FileNotFoundError:
        pass
    else:
        word = input("Give me a word: ") 
        print("'" + word + "'" + ' appears ' + 
            str(contents.lower().count(word.lower())) + ' times.\n')

Upvotes: 1

cdlane
cdlane

Reputation: 41905

Splitting on whitespace isn't sufficient -- split on everything you're not counting and get your case under control:

import re
import sys

file = open(sys.argv[1])

word = sys.argv[2]

print(re.split(r"[^a-z]+", file.read().casefold()).count(word.casefold()))

You can add apostrophes to the inverted pattern [^a-z'] or whatever else you want to include in your count.

Hogan: Colonel, you're asking and answering your own questions. That's tops in German efficiency.

Upvotes: 1

tdelaney
tdelaney

Reputation: 77407

Word counts can be tricky. At a minimum, one would like to avoid differences in capitalization and punctuation. A simple way to take the next step in word counts is to use regular expressions and to convert its resulting words to lower case before we do the count. We could even use collections.Counter and count all of the words.

import re

# `word_finder(somestring)` emits all words in string as list
word_finder = re.compile(r'\w+').findall

filename = input('filename: ')
word = input('word: ')

# remove case for compare
lword = word.lower()

# `word_finder` emits all of the words excluding punctuation
# `filter` removes the lower cased words we don't want
# `len` counts the result
count = len(list(filter(lambda w: w.lower() == lword,
    word_finder(open(filename).read()))))
print(count)

# we could go crazy and count all of the words in the file
# and do it line by line to reduce memory footprint.
import collections
import itertools
from pprint import pprint

word_counts = collections.Counter(itertools.chain.from_iterable(
    word_finder(line.lower()) for line in open(filename)))
print(pprint(word_counts))

Upvotes: 1

linky00
linky00

Reputation: 65

First, you want to open the file. Do this with:

your_file = open('file.txt', 'r')

Next, you want to count the word. Let's set your word as brian under the variable life. No reason.

your_file.read().split().count(life)

What that does is reads the file, splits it into individual words, and counts the instances of the word 'brian'. Hope this helps!

Upvotes: 0

George Willcox
George Willcox

Reputation: 673

So to do this, first you must open the file (Assuming you have a file of text called 'text.txt') We do this by calling the open function.

file = open('text.txt', 'r')

The open function uses the syntax: open(file, mode)

The file being the text document, and the mode being how it's opened. ('r' means read only) The read function just reads the file, then split separates each of the words into a list object. Lastly, we use the count function to find how many times the word appears.

word = input('word: ')
print(file.read().split().count(word))

And there you have it, counting words in a text file!

Upvotes: 1

Related Questions