Reputation: 297
How can I split sentences, when punctuation is detected (.?!) and occurs between two words without a space?
Example:
>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not
working as expected.Because there isn't a space after dot.")
output:
['This is an example.',
"Not working as expected.Because there isn't a space after dot."]
expected:
['This is an example.',
'Not working as expected.',
'Because there isn't a space after dot.']`
Upvotes: 0
Views: 74
Reputation: 3603
Use https://regex101.com/r/icrJNl/3/.
import re
from pprint import pprint
split_text = re.findall(".*?[?.!]", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
Note: .*?
is a lazy (or non-greedy) quantifier in opposite to .*
which is a greedy quantifier.
Output:
['This is an example!',
' Working as expected?',
'Because.']
Another solution:
import re
from pprint import pprint
split_text = re.split("([?.!])", "This is an example! Working as "
"expected?Because.")
pprint(split_text)
Output:
['This is an example',
'!',
' Working as expected',
'?',
'Because',
'.',
'']
Upvotes: 0
Reputation: 2689
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")
+ is used for 1 or more of something, * for zero of more.
if you need to keep the . you probably don't want to split, instead you could do:
splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")
which gives
['This is an example.',
' Not working as expected.',
"Because there isn't a space after dot."]
you can trim those by playing with the regex (eg '\s*.*?[.?!]'
) or just using .trim()
Upvotes: 1