Reputation: 495
I have the following string:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
All of the punctuation should be separated from the words EXCEPT for "/", " ' ", "-", "+" and "$".
So the output should be:
"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"
I used the following code:
for x in string.punctuation:
if x == "/":
continue
if x == "'":
continue
if x == "-":
continue
if x == "+":
continue
if x == "$":
continue
input = input.replace(x," %s " % x)
I get the following output:
I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10 . It's free-to-use , no $$$ involved !
It works, but the problem is that it sometimes leaves TWO spaces between the punctuation and the word, such as between the first exclamation mark in the sentence and the word "Do". This is because there is already a space between them.
This problem would also occur with: input = "Hello. (hi)". The output would be:
" Hello . ( hi ) "
Note the two spaces before the open bracket.
I need the output with only ONE space between any punctuation and the words, except for the 5 punctuations mentioned above, which are not separated from words. How can I fix this? Or, is there a better way to do this using regex?
Thanks in advance.
Upvotes: 2
Views: 5044
Reputation: 1391
# Approach 1
import re
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r'\1 \2', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r'\1 \2', sample_input))
# Approach 2
import string
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
punctuation = string.punctuation.replace('/', '').replace("'", '') \
.replace('-', '').replace('+', '').replace('$', '')
i = 0
while i < len(sample_input):
if sample_input[i] not in punctuation:
i += 1
continue
if i > 0 and sample_input[i-1] != ' ':
sample_input = sample_input[:i] + ' ' + sample_input[i:]
i += 1
if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
i += 1
i += 1
print(sample_input)
Upvotes: 0
Reputation: 5695
It seems to me a negated character class is simpler:
import re
input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" \1 ", input_string)
Output:
I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved !
Upvotes: 0
Reputation: 2621
Unable to comment due to lack of reputation, but in this case here
between the first exclamation mark in the sentence and the word "Do"
It looks like there are two spaces because there is already a space between ! and Do
! Do
So, if there is already a space after the punctuation, don't put another space.
Also, there is a similar question here: python regex inserting a space between punctuation and letters
So maybe consider using re
?
Upvotes: 0
Reputation: 881595
Looks like re
can do it for you...
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
and
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", "Hello. (hi)")
'Hello . ( hi ) '
If the trailing space is a problem, .rtrim(theresult, ' ')
should fix it for you:-)
Upvotes: 8
Reputation: 16940
Can i try this way:
>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
... if x in string.punctuation:
... ls.append(' %s' % x)
... else:
... ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no $ $ $ involved !"
>>>
Upvotes: 0