Python regex replace part of string in a column which occurs after specific regex

Question

I want to remove occurrence V, I or VI only when it is inside a bracket such as below:

Input:

VINE(PCI); BLUE(PI)
BLACK(CVI)
CINE(PCVI)

Output desired:

VINE(PC); BLUE(P)
BLACK(C)
CINE(PC)

When I use df['col'].str.replace('[PC]+([VI]+)', "") it replaces everything inside the brackets. and when I use just df['col'].str.replace('[VI]+', "") it ofcourse doesn't work as it then removes all other occurrences of V and I. Inside the bracket there will only be these 4 letters in any combination of either (or both) PC and either (or both) VI. What am I doing wrong here pls?

Thanks

cs95 · Accepted Answer

Use str.replace with a capture group and callback:

import re
df['col'] = df['col'].str.replace(
    r'$(.*?)$', lambda x: re.sub('[VI]', '', f'({x.group(1)})'))

Or,

df['col'] = df['col'].str.replace(r'$(P|PC|C)[VI]+$',r'(\1)') # Credit, OP
print(df)
                 col
0  VINE(PC); BLUE(P)
1           BLACK(C)
2           CINE(PC)

Python regex replace part of string in a column which occurs after specific regex

Answers (2)

Related Questions