Reputation: 403
I have strings such as the following:
s1 = 'Hello , this is a [ test ] string with ( parenthesis ) .'
I am trying to remove whitespace surrounding punctuation so it should look like this:
s1 = 'Hello, this is a [test] string with (parenthesis).'
I found this to bit of code from here: How to strip whitespace from before but not after punctuation in python
req = re.sub(r'\s([?,.!"](?:\s|$))', r'\1', text)
I added ] and ) to the regex to include removing whitespace after ] or )
req = re.sub(r'\s([?,.!\])"](?:\s|$))', r'\1', text)
So it now looks like this:
s1 = 'Hello, this is a [ test] string with ( parenthesis).'
Now I have been trying to adjust this to also remove the whitespace before [ or ( but I can't figure out how. I am very confused when it comes it regex.
I understand re.sub() is replacing the second argument (r'\1') with the first argument but I don't understand what (r'\1') actually means.
Any help would be appreciated,
Cheers
Upvotes: 2
Views: 1070
Reputation: 37755
One way is not to capture the space at start and end inside the parenthesis,i.e.
(parens start) some space (capture text) some space (parens close)
| | |
Group 1 Group 2 Group 3
Match the . or , preceded by space using alternation
and capture it in a separate group
([[({])\s*(.*?)\s*([\]\)\}])|\s+([,.])
Replace by \1\2\3\4
Upvotes: 1
Reputation: 82765
This might help using lookbehind & lookahead.
import re
s1 = 'Hello , this is a [ test ] string with ( parenthesis ).'
#print(re.sub(r"(?<=\[|\()(.*?)(?=\)|\])", lambda x: x.group().strip(), s1))
print(re.sub(r'(\s([?,.!"]))|(?<=\[|\()(.*?)(?=\)|\])', lambda x: x.group().strip(), s1))
Output:
Hello, this is a [test] string with (parenthesis).
Upvotes: 2