spiff
spiff

Reputation: 1495

Python regex replace part of string in a column which occurs after specific regex

I want to remove occurrence V, I or VI only when it is inside a bracket such as below:

Input:

VINE(PCI); BLUE(PI)
BLACK(CVI)
CINE(PCVI)

Output desired:

VINE(PC); BLUE(P)
BLACK(C)
CINE(PC)

When I use df['col'].str.replace('[PC]+([VI]+)', "") it replaces everything inside the brackets. and when I use just df['col'].str.replace('[VI]+', "") it ofcourse doesn't work as it then removes all other occurrences of V and I. Inside the bracket there will only be these 4 letters in any combination of either (or both) PC and either (or both) VI. What am I doing wrong here pls?

Thanks

Upvotes: 1

Views: 88

Answers (2)

Neroksi
Neroksi

Reputation: 1398

Another solution using only pandas :

import pandas as pd
S = pd.Series(["VINE(PCI)", "BLUE(PI)", "BLACK(CVI)", 'CINE(PCVI)'])
S.str.split('[\(\)]').apply(lambda x :  x[0] + "(" + x[1].replace("I", "").replace("V", "") + ")" + x[2])
0    VINE(PC)
1     BLUE(P)
2    BLACK(C)
3    CINE(PC)
dtype: object

Upvotes: 0

cs95
cs95

Reputation: 402483

Use str.replace with a capture group and callback:

import re
df['col'] = df['col'].str.replace(
    r'\((.*?)\)', lambda x: re.sub('[VI]', '', f'({x.group(1)})'))

Or,

df['col'] = df['col'].str.replace(r'\((P|PC|C)[VI]+\)',r'(\1)') # Credit, OP
print(df)
                 col
0  VINE(PC); BLUE(P)
1           BLACK(C)
2           CINE(PC)

Upvotes: 1

Related Questions