Reputation: 720
I have different strings of the form _AHDHDUHD[Tsfs (SGYA)]AHUDSHDI_
and I want to cut out the (SGYA)
part (always capital letters in round brackets) and eventual spaces directly before or after it. So the result should be _AHDHDUHD[Tsfs]AHUDSHDI_
.
I had the idea of matching the content of the square brackets with ([A-Z_])(\[.+\])([A-Z_])
and then doing a split and re-inserting it using re
module (although I am not sure which re
function is suited for this).
However, this feels inelegant. Is there a regex
that would do what I want directly, without the intermediary steps?
Upvotes: 0
Views: 66
Reputation: 163207
You could use 2 capturing groups and in the replacement use both capturing groups \1\2
([A-Z_]+\[[^(\s]+)[^\S\r\n]*\([A-Z]+\)[^\S\r\n]*(\][A-Z_]+)
In parts
(
Capture group 1
[A-Z_]+
Match 1+ chars A-Z
or _
\[[^(\s]+
Match [
and 1+ any chars except the listed)
Close group[^\S\r\n]*
Match 0+ whitespace chars except newline\([A-Z]+\)
Match chars A-Z
between parenthesis[^\S\r\n]*
Match 0+ whitespace chars except newline(
Capture group 2
\][A-Z_]+
Match ]
and 1+ chars A-Z
or _
)
Close groupFor example
import re
regex = r"([A-Z_]+\[[^(\s]+)[^\S\r\n]*\([A-Z]+\)[^\S\r\n]*(\][A-Z_]+)"
test_str = "_AHDHDUHD[Tsfs (SGYA)]AHUDSHDI_"
print(re.sub(regex, r"\1\2", test_str))
Output
_AHDHDUHD[Tsfs]AHUDSHDI_
Upvotes: 0
Reputation: 5372
This will do what you want:
Python 3.7.5 (default, Oct 17 2019, 12:16:48)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s='_AHDHDUHD[Tsfs (SGYA)]AHUDSHDI_'
>>> re.sub(r'(?:\s?\((.*)\))', '', s)
'_AHDHDUHD[Tsfs]AHUDSHDI_'
>>>
If you want to only match capital letters inside square brackets, then the expression should be:
>>> re.sub(r'(?:\s?\(([A-Z]+)\))', '', s)
'_AHDHDUHD[Tsfs]AHUDSHDI_'
>>>
I hope it helps.
Upvotes: 1
Reputation: 23
You are looking for the re.sub function
import re
s = "AHDHDUHD[Tsfs (SGYA)]AHUDSHDI"
s_re = re.sub("(.*?)(\s*\(.*?\)\s*)(.*?)", '', s)
print (s_re)
It will print:
AHDHDUHD[Tsfs]AHUDSHDI
Upvotes: 0
Reputation: 626728
You may use
re.sub(r'(\[[^][]*?)\s*\([A-Z]*\)\s*([^][]*])', r'\1\2', text)
See the regex demo
Details
(\[[^][]*?)
- Group 1: a [
and then any 0+ chars other than [
and ]
as few as possible\s*
- 0+ whitespaces\(
- a (
char[A-Z]*
- 0+ uppercase ASCII letters\)
- a )
char\s*
- 0+ whitespaces([^][]*])
- Group 2: any 0+ chars other than ]
and [
(as many as possible) and then a ]
import re
rx = r"(\[[^][]*?)\s*\([A-Z]*\)\s*([^][]*])"
s = "_AHDHDUHD[Tsfs (SGYA)]AHUDSHDI"
print( re.sub(rx, r'\1\2', s) )
# => _AHDHDUHD[Tsfs]AHUDSHDI
Another idea: only remove all \s*\([A-Z]+\)\s*
matches when found inside [...]
substrings:
import re
s = "_AHDHDUHD[Tsfs (SGYA)]AHUDSHDI"
print( re.sub(r"\[[^][]+]", lambda x: re.sub(r'\s*\([A-Z]+\)\s*', "", x.group()), s) )
# => _AHDHDUHD[Tsfs]AHUDSHDI
See another Python demo.
Here, the \[[^][]+]
pattern will find all chunks of [
, then 1+ chars other than square brackets and then a ]
, and then any occurrences of 0+ whitespaces, (
, 1+ uppercase ASCII letters, )
and 0+ whitespaces will be removed only inside the matches found with the \[[^][]+]
pattern.
Upvotes: 1
Reputation: 1743
import re
weirdstring = "_AHDHDUHD[Tsfs (SGYA)]AHUDSHDI_"
weirdstring = re.sub(r'(.*?)(\s*\(.*?\)\s*)(.*?)', r'\1\3', weirdstring)
print(weirdstring)
# prints _AHDHDUHD[Tsfs]AHUDSHDI_
Upvotes: 1