Cleb
Cleb

Reputation: 26039

Replace strings in a string by a substring of those strings

Let's say I have a string like this:

s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'

and I want to turn it into

'(xy09 and foobar or (abc123 and something))'

then - in this particular case - I could simply do

s.replace('X_', "")

which gives the desired output.

However, in my actual data there might be not only X_ but also other prefixes, so the above replace statement does not work.

What I would need instead is a replacement of

a capital letter followed by an underscore and an arbitrary sequence of letters and numbers

by

everything after the first underscore.

So, to extract the desired elements I could use:

import re
print(re.findall('[A-Z]{1}_[a-zA-Z0-9]+', s))

which prints

['X_xy09', 'X_foobar', 'X_abc123', 'X_something']

how can I now replace those elements so that I obtain

'(xy09 and foobar or (abc123 and something))'

?

Upvotes: 2

Views: 86

Answers (4)

Bill the Lizard
Bill the Lizard

Reputation: 406125

If you just need to replace a capital letter followed by an underscore, you can use the regular expression r'[A-Z]_'.

s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
re.sub(r'[A-Z]_', '', s)

You may need to add to it if you have other criteria not mentioned. (For example, some of your target values follow a word boundary and some follow parentheses.) The above might give you the wrong output if you have input like XY_something. It depends on what you expect the output to be.

Upvotes: 3

Eugene Yarmash
Eugene Yarmash

Reputation: 150178

You could use re.sub() with a lookahead assertion:

>>> import re
>>> s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
>>> re.sub(r'\b[A-Z]_(?=[a-zA-Z0-9])', '', s)
'(xy09 and foobar or (abc123 and something))'

from the docs:

(?=...)
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92904

Another re.sub() approach:

import re

s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
result = re.sub(r'[A-Z]_(?=[a-zA-Z0-9]+)', '', s)

print(result)

The output:

(xy09 and foobar or (abc123 and something))

  • [A-Z]_(?=[a-zA-Z0-9]+) - (?=...) positive lookahead assertion, ensures that substituted [A-Z]_ substring is followed by alphanumeric sequence [a-zA-Z0-9]+

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

If you need to remove an uppercase ASCII letter with an underscore after it, only when not preceded with a word char and when followed with an alphanumeric char, you may use

import re
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
print(re.sub(r'\b[A-Z]_([a-zA-Z0-9])', r'\1', s))

See the Python demo and a regex demo.

Pattern details

  • \b - a leading word boundary
  • [A-Z]_ - an ASCII uppercase letter and _
  • ([a-zA-Z0-9]) - Group 1 (later referenced to with \1 from the replacement pattern): 1 alphanumeric char.

Upvotes: 3

Related Questions