Reputation: 26039
Let's say I have a string like this:
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
and I want to turn it into
'(xy09 and foobar or (abc123 and something))'
then - in this particular case - I could simply do
s.replace('X_', "")
which gives the desired output.
However, in my actual data there might be not only X_
but also other prefixes, so the above replace
statement does not work.
What I would need instead is a replacement of
a capital letter followed by an underscore and an arbitrary sequence of letters and numbers
by
everything after the first underscore.
So, to extract the desired elements I could use:
import re
print(re.findall('[A-Z]{1}_[a-zA-Z0-9]+', s))
which prints
['X_xy09', 'X_foobar', 'X_abc123', 'X_something']
how can I now replace those elements so that I obtain
'(xy09 and foobar or (abc123 and something))'
?
Upvotes: 2
Views: 86
Reputation: 406125
If you just need to replace a capital letter followed by an underscore, you can use the regular expression r'[A-Z]_'
.
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
re.sub(r'[A-Z]_', '', s)
You may need to add to it if you have other criteria not mentioned. (For example, some of your target values follow a word boundary and some follow parentheses.) The above might give you the wrong output if you have input like XY_something
. It depends on what you expect the output to be.
Upvotes: 3
Reputation: 150178
You could use re.sub()
with a lookahead assertion:
>>> import re
>>> s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
>>> re.sub(r'\b[A-Z]_(?=[a-zA-Z0-9])', '', s)
'(xy09 and foobar or (abc123 and something))'
from the docs:
(?=...)
Matches if...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example,Isaac (?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
Upvotes: 2
Reputation: 92904
Another re.sub()
approach:
import re
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
result = re.sub(r'[A-Z]_(?=[a-zA-Z0-9]+)', '', s)
print(result)
The output:
(xy09 and foobar or (abc123 and something))
[A-Z]_(?=[a-zA-Z0-9]+)
- (?=...)
positive lookahead assertion, ensures that substituted [A-Z]_
substring is followed by alphanumeric sequence [a-zA-Z0-9]+
Upvotes: 2
Reputation: 627537
If you need to remove an uppercase ASCII letter with an underscore after it, only when not preceded with a word char and when followed with an alphanumeric char, you may use
import re
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
print(re.sub(r'\b[A-Z]_([a-zA-Z0-9])', r'\1', s))
See the Python demo and a regex demo.
Pattern details
\b
- a leading word boundary [A-Z]_
- an ASCII uppercase letter and _
([a-zA-Z0-9])
- Group 1 (later referenced to with \1
from the replacement pattern): 1 alphanumeric char.Upvotes: 3