Reputation: 81
I'm having a bit of trouble trying to count the number of words per sentence. For my case, I'm assuming sentences only end with either "!"
, "?"
, or "."
I have a list that looks like this:
["Hey, "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
For the example above, the calculation would be 1 + 3 + 5 / 3
. I'm having a hard time achieving this, though! Any ideas?
Upvotes: 0
Views: 4495
Reputation: 111
A simple solution:
mylist = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
terminals = set([".", "?", "!"]) # sets are efficient for "membership" tests
terminal_count = 0
for item in mylist:
if item in terminals: # here is our membership test
terminal_count += 1
avg = (len(mylist) - terminal_count) / float(terminal_count)
This assumes you only care about getting the average, not the individual counts per sentence.
If you'd like to get a little fancy, you can replace the for
loop with something like this:
terminal_count = sum(1 for item in mylist if item in terminals)
Upvotes: 3
Reputation: 92854
Short solution using re.split() and sum() functions:
import re
s = "Hey ! How are you ? I would like a sandwich ."
parts = [len(l.split()) for l in re.split(r'[?!.]', s) if l.strip()]
print(sum(parts)/len(parts))
The output:
3.0
In case if there could be only a list of words as input:
import re
s = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
parts = [len(l.split()) for l in re.split(r'[?!.]', ' '.join(s)) if l.strip()]
print(sum(parts)/len(parts)) # 3.0
Upvotes: 1
Reputation: 113985
words = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
sentences = [[]]
ends = set(".?!")
for word in words:
if word in ends: sentences.append([])
else: sentences[-1].append(word)
if sentences[0]:
if not sentences[-1]: sentences.pop()
print("average sentence length:", sum(len(s) for s in sentences)/len(sentences))
Upvotes: 3