modarwish
modarwish

Reputation: 495

Python - How do I separate punctuation from words by white space leaving only one space between the punctuation and the word?

I have the following string:

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

All of the punctuation should be separated from the words EXCEPT for "/", " ' ", "-", "+" and "$".

So the output should be:

"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"

I used the following code:

for x in string.punctuation:
    if x == "/":
        continue
    if x == "'":
        continue
    if x == "-":
        continue
    if x == "+":
        continue
    if x == "$":
        continue
    input = input.replace(x," %s " % x)

I get the following output:

I love programming with Python-3 . 3 !  Do you ?  It's great .  .  .  I give it a 10/10 .  It's free-to-use ,  no $$$ involved ! 

It works, but the problem is that it sometimes leaves TWO spaces between the punctuation and the word, such as between the first exclamation mark in the sentence and the word "Do". This is because there is already a space between them.

This problem would also occur with: input = "Hello. (hi)". The output would be:

" Hello .  ( hi ) "

Note the two spaces before the open bracket.

I need the output with only ONE space between any punctuation and the words, except for the 5 punctuations mentioned above, which are not separated from words. How can I fix this? Or, is there a better way to do this using regex?

Thanks in advance.

Upvotes: 2

Views: 5044

Answers (5)

Johann Chang
Johann Chang

Reputation: 1391

# Approach 1

import re

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r'\1 \2', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r'\1 \2', sample_input))

# Approach 2

import string

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

punctuation = string.punctuation.replace('/', '').replace("'", '') \
        .replace('-', '').replace('+', '').replace('$', '')

i = 0

while i < len(sample_input):
    if sample_input[i] not in punctuation:
        i += 1
        continue

    if i > 0 and sample_input[i-1] != ' ':
        sample_input = sample_input[:i] + ' ' + sample_input[i:]
        i += 1

    if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
        sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
        i += 1

    i += 1

print(sample_input)

Upvotes: 0

Michel M&#252;ller
Michel M&#252;ller

Reputation: 5695

It seems to me a negated character class is simpler:

import re

input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" \1 ", input_string)

Output:

I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved ! 

Upvotes: 0

freddiev4
freddiev4

Reputation: 2621

Unable to comment due to lack of reputation, but in this case here

between the first exclamation mark in the sentence and the word "Do"

It looks like there are two spaces because there is already a space between ! and Do

! Do

So, if there is already a space after the punctuation, don't put another space.

Also, there is a similar question here: python regex inserting a space between punctuation and letters

So maybe consider using re?

Upvotes: 0

Alex Martelli
Alex Martelli

Reputation: 881595

Looks like re can do it for you...

>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "

and

>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", "Hello. (hi)")
'Hello . ( hi ) '

If the trailing space is a problem, .rtrim(theresult, ' ') should fix it for you:-)

Upvotes: 8

James Sapam
James Sapam

Reputation: 16940

Can i try this way:

>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
...     if x in string.punctuation:
...         ls.append(' %s' % x)
...     else:
...         ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no  $ $ $ involved !"
>>>

Upvotes: 0

Related Questions