Reputation: 35
Im working on a problem set to count sentences. I decided to implement by using regular expressions to split the string at the characters "?, ., !". When I pass my text to re.split, it is including an empty string at the end of the list.
source code:
from cs50 import get_string
import re
def main():
text = get_string("Text: ")
cole_liau(text)
# Implement 0.0588 * L - 0.296 * S - 15.8; l = avg num of letters / 100 words , S = avg num of sentences / 100 words
def cole_liau(intext):
words = []
letters = []
sentences = re.split(r"[.!?]+", intext)
print(sentences)
print(len(sentences))
main()
Output:
Text: Congratulations! Today is your day. You're off to Great Places! You're off and away!
['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away", '']
5
I tried adding the + expression to make sure it was matching at least 1 [.!?] but that did not work either.
Upvotes: 2
Views: 175
Reputation: 1500
re.split
is working fine here. You have a !
at the end of the last sentence, so it will split the text before (a sentence), and after (a null character).
You can just add [:-1]
at the end of your line to remove the last element of the list :
sentences = re.split(r"[.!?]+", intext)[:-1]
Output :
['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
Upvotes: 1
Reputation: 43169
You may use a comprehension:
def cole_liau(intext):
words = []
letters = []
sentences = [sent for sent in re.split(r"[.!?]+", intext) if sent]
print(sentences)
print(len(sentences))
Which yields
['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
4
As to why re.split()
returns an empty string, see this answer.
Upvotes: 1