Nick Ferrari
Nick Ferrari

Reputation: 35

How do I stop regex from matching unwanted empty strings?

Im working on a problem set to count sentences. I decided to implement by using regular expressions to split the string at the characters "?, ., !". When I pass my text to re.split, it is including an empty string at the end of the list.

source code:

from cs50 import get_string
import re


def main():
    text = get_string("Text: ")
    cole_liau(text)


# Implement 0.0588 * L - 0.296 * S - 15.8; l = avg num of letters / 100 words , S = avg num of sentences / 100 words
def cole_liau(intext):

    words = []
    letters = []

    sentences = re.split(r"[.!?]+", intext)
    print(sentences)
    print(len(sentences))

main()

Output:

Text: Congratulations! Today is your day. You're off to Great Places! You're off and away!

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away", '']

5

I tried adding the + expression to make sure it was matching at least 1 [.!?] but that did not work either.

Upvotes: 2

Views: 175

Answers (2)

totok
totok

Reputation: 1500

re.split is working fine here. You have a ! at the end of the last sentence, so it will split the text before (a sentence), and after (a null character).

You can just add [:-1] at the end of your line to remove the last element of the list :

sentences = re.split(r"[.!?]+", intext)[:-1]

Output :

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]

Upvotes: 1

Jan
Jan

Reputation: 43169

You may use a comprehension:

def cole_liau(intext):

    words = []
    letters = []

    sentences = [sent for sent in re.split(r"[.!?]+", intext) if sent]
    print(sentences)
    print(len(sentences))

Which yields

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
4

As to why re.split() returns an empty string, see this answer.

Upvotes: 1

Related Questions