SagiLow
SagiLow

Reputation: 6039

Python regex : Replacing dots and signs followed by spaces

I want to replace dot / ? / ! followed by spaced (if any) to a breakline char \n and eliminate the whitespaces. So in case of : hello world. It's nice. I want it to be hello world.\nIt'snice.\n
This is what I thought of (but it doesn't work, otherwise I wouldn't write this question ha? )

re.sub(r'\.!?( *)', r'.\n\1', line)

Thanks !

Upvotes: 0

Views: 8224

Answers (3)

devnull
devnull

Reputation: 123608

Without lookaround:

>>> import re
>>> line="hello world! What? It's nice."
>>> re.sub(r'([.?!]+) *', r'\1\n', line)   # Capture punctuations; discard spaces
"hello world!\nWhat?\nIt's nice.\n"

>>> line="hello world! His thoughts trailed away... What?"
>>> re.sub(r'([.?!]+) *', r'\1\n', line)
'hello world!\nHis thoughts trailed away...\nWhat?\n'

Upvotes: 3

Martijn Pieters
Martijn Pieters

Reputation: 1123830

Match spaces or the end of the string with a positive look-behind:

re.sub(r'(?<=[.?!])( +|\Z)', r'\n', text)

Because this matches just spaces that are preceded by punctuation, you don't need to use a back reference.

The + ensures that only punctuation followed by a space is matched here. The text:

"His thoughts trailed away... His heart wasn't in it!"

would otherwise receive too many newlines.

Demo:

>>> import re
>>> text = "hello world. It's nice."
>>> re.sub(r'(?<=[.?!])( +|\Z)', r'\n', text)
"hello world.\nIt's nice.\n"
>>> text = "His thoughts trailed away... His heart wasn't in it!"
>>> re.sub(r'(?<=[.?!])( +|$)', r'\n', text)
"His thoughts trailed away...\nHis heart wasn't in it!\n"

Upvotes: 2

philshem
philshem

Reputation: 25351

Did you try replace?

print text.replace('. ','.\n').replace('? ','?\n').replace('! ','!\n')

Upvotes: 0

Related Questions