Reputation: 495
I have the following input:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
First, every sentence should be moved to a new line. Then, all of the punctuation should be separated from the words EXCEPT for "/", " ' ", "-", "+" and "$".
So the output should be:
"I love programming with Python-3 . 3 !
Do you ?
It's great . . .
I give it a 10/10 .
It's free-to-use , no $$$ involved !"
I used the following code:
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
But the problem is that it does not separate sentences into new lines. How can I use a regex to do that before I create whitespace between punctuation and characters?
Upvotes: 2
Views: 2677
Reputation: 67968
([!?.])(?=\s*[A-Z])\s*
You can use this regex to create sentences before your regex.See demo.Replace by \1\n
.
https://regex101.com/r/sH8aR8/5
x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"([!?.])(?=\s*[A-Z])",r"\1\n",x)
EDIT:
(?<![A-Z][a-z])([!?.])(?=\s*[A-Z])\s*
Try this.See demo for your different set of data.
https://regex101.com/r/sH8aR8/9
Upvotes: 2
Reputation: 26667
Something like
>>> import re
>>> from string import punctuation
>>> print re.sub(r'(?<=['+punctuation+'])\s+(?=[A-Z])', '\n', input)
I love programming with Python-3.3!
Do you?
It's great...
I give it a 10/10.
It's free-to-use, no $$$ involved!
Upvotes: 2