Reputation: 1495
I want to remove occurrence V, I or VI only when it is inside a bracket such as below:
Input:
VINE(PCI); BLUE(PI)
BLACK(CVI)
CINE(PCVI)
Output desired:
VINE(PC); BLUE(P)
BLACK(C)
CINE(PC)
When I use df['col'].str.replace('[PC]+([VI]+)', "")
it replaces everything inside the brackets. and when I use just df['col'].str.replace('[VI]+', "")
it ofcourse doesn't work as it then removes all other occurrences of V and I.
Inside the bracket there will only be these 4 letters in any combination of either (or both) PC and either (or both) VI.
What am I doing wrong here pls?
Thanks
Upvotes: 1
Views: 88
Reputation: 1398
Another solution using only pandas :
import pandas as pd
S = pd.Series(["VINE(PCI)", "BLUE(PI)", "BLACK(CVI)", 'CINE(PCVI)'])
S.str.split('[\(\)]').apply(lambda x : x[0] + "(" + x[1].replace("I", "").replace("V", "") + ")" + x[2])
0 VINE(PC)
1 BLUE(P)
2 BLACK(C)
3 CINE(PC)
dtype: object
Upvotes: 0
Reputation: 402483
Use str.replace
with a capture group and callback:
import re
df['col'] = df['col'].str.replace(
r'\((.*?)\)', lambda x: re.sub('[VI]', '', f'({x.group(1)})'))
Or,
df['col'] = df['col'].str.replace(r'\((P|PC|C)[VI]+\)',r'(\1)') # Credit, OP
print(df)
col
0 VINE(PC); BLUE(P)
1 BLACK(C)
2 CINE(PC)
Upvotes: 1