HossBender
HossBender

Reputation: 1079

.split python word count

I need to count the words in a sentence. For example, "I walk my dog." Would be 4 words, but "I walk my 3 dogs" would only be 4 words because numbers are not words. The code can only count alphabetic words. I understand how to count words by simply using the following:

len(string.split)

but this doesn't account for numbers. Is there a simply way (for a beginner) to account for numbers, symbols, etc? thank you.

Upvotes: 1

Views: 17788

Answers (5)

Jeff Langemeier
Jeff Langemeier

Reputation: 1018

Since, due to comments it looks like he wants something that doesn't use .isalpha, we could run this in a try/except.

count = 0
for word in line.split():
    try:
        int(word)
    except ValueError:
        count += 1

I know it's not pretty, but it handles it correctly.

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142176

Here's another option:

import re

lines = [
    'I walk by dog',
    'I walk my 3 dogs',
    'I walk my Beagle-Harrier' # DSM's example
]

for line in lines:
    words = re.findall('[a-z-]+', line, flags=re.I)
    print line, '->', len(words), words

# I walk by dog -> 4 ['I', 'walk', 'by', 'dog']
# I walk my 3 dogs -> 4 ['I', 'walk', 'my', 'dogs']
# I walk my Beagle-Harrier -> 4 ['I', 'walk', 'my', 'Beagle-Harrier']

Upvotes: 2

SethMMorton
SethMMorton

Reputation: 48745

If you don't want to use .isalpha

sum(not word.isdigit() for word in line.split())

This will return True for each word that is not a number, and False for each word that is a number. This code takes advantage of the fact that in python, True == 1 and False == 0, so you will get the number of non-number words.


If you are uncomfortable with using the int-ness of bools, you can make it explicit to the reader of your code by adding the int function (this is 100% not needed, but can make the code clearer if you like it that way)

sum(int(not word.isdigit()) for word in line.split())

Upvotes: 0

Jason Scheirer
Jason Scheirer

Reputation: 1698

You can use .isalpha() on strings.

len([word for word in sentence.split() if word.isalpha()])

Upvotes: 1

thefourtheye
thefourtheye

Reputation: 239513

totalWords = sum(1 for word in line.split() if word.isalpha())

You can use split function on the line to split it based on spaces. And then check if each word has only alphabets using isalpha function. If it is true, then include 1. Sum all of them at the end.

Upvotes: 4

Related Questions