zppinto
zppinto

Reputation: 297

Punctuation not detected between words with no space

How can I split sentences, when punctuation is detected (.?!) and occurs between two words without a space?

Example:

>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not 
    working as expected.Because there isn't a space after dot.")  

output:

['This is an example.', 
"Not working as expected.Because there isn't a space after dot."] 

expected:

['This is an example.', 
'Not working as expected.', 
'Because there isn't a space after dot.']`

Upvotes: 0

Views: 74

Answers (2)

glegoux
glegoux

Reputation: 3603

Use https://regex101.com/r/icrJNl/3/.

import re
from pprint import pprint

split_text = re.findall(".*?[?.!]", "This is an example! Working as "
                        "expected?Because.")

pprint(split_text)

Note: .*? is a lazy (or non-greedy) quantifier in opposite to .* which is a greedy quantifier.

Output:

['This is an example!', 
 ' Working as expected?', 
 'Because.']

Another solution:

import re
from pprint import pprint

split_text = re.split("([?.!])", "This is an example! Working as "
    "expected?Because.")

pprint(split_text)

Output:

['This is an example', 
'!', 
' Working as expected', 
'?', 
'Because', 
'.', 
'']

Upvotes: 0

Stael
Stael

Reputation: 2689

splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")

+ is used for 1 or more of something, * for zero of more.

if you need to keep the . you probably don't want to split, instead you could do:

splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")

which gives

['This is an example.',
 ' Not working as expected.',
 "Because there isn't a space after dot."]

you can trim those by playing with the regex (eg '\s*.*?[.?!]') or just using .trim()

Upvotes: 1

Related Questions