codiearcher
codiearcher

Reputation: 403

Regex for removing whitespace after a parenthesis python

I have strings such as the following:

s1 = 'Hello , this is a [ test ] string with ( parenthesis ) .'

I am trying to remove whitespace surrounding punctuation so it should look like this:

s1 = 'Hello, this is a [test] string with (parenthesis).'

I found this to bit of code from here: How to strip whitespace from before but not after punctuation in python

req = re.sub(r'\s([?,.!"](?:\s|$))', r'\1', text)

I added ] and ) to the regex to include removing whitespace after ] or )

 req = re.sub(r'\s([?,.!\])"](?:\s|$))', r'\1', text)

So it now looks like this:

s1 = 'Hello, this is a [ test] string with ( parenthesis).'

Now I have been trying to adjust this to also remove the whitespace before [ or ( but I can't figure out how. I am very confused when it comes it regex.

I understand re.sub() is replacing the second argument (r'\1') with the first argument but I don't understand what (r'\1') actually means.

Any help would be appreciated,

Cheers

Upvotes: 2

Views: 1070

Answers (2)

Code Maniac
Code Maniac

Reputation: 37755

One way is not to capture the space at start and end inside the parenthesis,i.e.

 (parens start) some space (capture text) some space (parens close)
      |                          |                         |
   Group 1                   Group 2                    Group 3

Match the . or , preceded by space using alternation and capture it in a separate group

([[({])\s*(.*?)\s*([\]\)\}])|\s+([,.])

enter image description here

Replace by \1\2\3\4

Regex Demo

Upvotes: 1

Rakesh
Rakesh

Reputation: 82765

This might help using lookbehind & lookahead.

import re

s1 = 'Hello , this is a [ test ] string with ( parenthesis ).'
#print(re.sub(r"(?<=\[|\()(.*?)(?=\)|\])", lambda x: x.group().strip(), s1))
print(re.sub(r'(\s([?,.!"]))|(?<=\[|\()(.*?)(?=\)|\])', lambda x: x.group().strip(), s1))

Output:

Hello, this is a [test] string with (parenthesis).

Upvotes: 2

Related Questions