Reputation: 124
I've strings in the form of:
s = "Wow that is really nice, ( 2.1 ) shows that according to the drawings in ( 1. 1) and a) there are errors."
and I would like to get a cleaned string in the form of:
s = "Wow that is really nice, (2.1) shows that according to the drawings in (1.1) and a) there are errors."
I tried to fix it with regex:
import re
regex = r" (?=[^(]*\))"
s = "Wow that is really nice, ( 2.1 ) shows that according to the drawings in ( 1. 1) and a) there are some errors."
re.sub(regex, "", s)
But I get faulty results like this:
Wow that is really nice, (2.1) shows that according to the drawings in (1.1)anda) there are some errors.
Does anyone know how to deal with this problem when you don't always have the same number of opening and closing brackets?
Upvotes: 0
Views: 95
Reputation: 163362
If you also want to match balanced parenthesis and remove the spaces, you can make use of the PyPi regex module and a recursive pattern
\([^)(]*+(?:(?R)[^)(]*)*+\)
See a regex demo.
Note that it will remove all spaces.
import regex
pattern = r"\([^)(]*+(?:(?R)[^)(]*)*+\)"
s = ("Wow that is really nice, ( 2.1 ) shows that according to the drawings in ( 1. 1) and a) there are errors.\n"
"Wow that is really nice, ( 2.1 (2.1 ( 1,3 ) ) )shows that according to the drawings in ( 1. 1) and a) there are errors.")
print(regex.sub(pattern, lambda m: m[0].replace(" ", ""), s))
Output
Wow that is really nice, (2.1) shows that according to the drawings in (1.1) and a) there are errors.
Wow that is really nice, (2.1(2.1(1,3)))shows that according to the drawings in (1.1) and a) there are errors.
To only remove the spaces after the (
and before the )
import regex
pattern = r"\([^)(]*+(?:(?R)[^)(]*)*+\)"
s = "Wow that is really nice, ( test in 2.1 (2.1 test( 1,3 test ) ) )shows that according to the drawings in ( 1. 1) and a) there are errors."
print(regex.sub(pattern, lambda m: regex.sub(r"(?<=\() +| +(?=\))", "", m[0]), s))
Output
Wow that is really nice, (test in 2.1 (2.1 test(1,3 test)))shows that according to the drawings in (1. 1) and a) there are errors.
Upvotes: 0
Reputation: 7026
try
r" (?=[^()]*\))"
This excludes 'close parenthesis' from the things that can be inside a pair of parentheses.
Whether this works will depends whether you have nested brackets in your text.
Nested brackets is not something that can be solved with regex- you need a parser (it may need to count the brackets)
Upvotes: 0
Reputation: 7970
You can match all the inner-most parentheneses with simple regex, and then perform a substitution on the matches to remove all the whitespaces.
import re
s = "Wow that is really nice, ( 2.1 ) shows that according to the drawings in ( 1. 1) and a) there are errors."
regex = r"\([^\(\)]*\)"
res = re.sub(regex, lambda s: s[0].replace(" ", ""), s)
print(res)
Upvotes: 1
Reputation: 17
I am not sure about that, but you can try to do the following:
s = s.replace('( ','(')
s = s.replace(' )',')')
Here replace(old, new) is standard function, that replace old string to the new one. I hope it will help.
Upvotes: 2
Reputation: 610
If the only whitespace you want to remove are the ones that occur directly after an opening bracket (or before a closing), then a simple string replace might work:
>>> s.replace("( ", "(").replace(" )", ")")
'Wow that is really nice, (2.1) shows that according to the drawings in (1. 1) and a) there are errors.'
Upvotes: 1