Reputation: 7733
I have a data frame like as shown below
import pandas as pd
import numpy as np
df=pd.DataFrame({'s_id':[1,2,3,4,5,6,7,8],
'test':['Metformin','Glipizide','Gliclazide','Glibenclamide','Repaglinide','nateglinide','sitagliptin','linagliptin']})
I would like to create a new column called op
based on values from the test
column according to the criteria below
If the test
value contains *etformi*
pattern, then the op
column should have a value C1
.
If the test
value contains the gli*
pattern, then the op
column should have a value C2
If the test
value contains the *nide
pattern, then the op
column should have a value C3
If the test
value contains the *gliptin
pattern, then the op
column should have a value C4
I tried the below but it doesn't work as expected and not sure how to merge them all together
df['test'].str.contains('metformi*', case=False, regex=True)
df['test'].str.contains(('gli*&&*ide'), case=False, regex=True)
df['test'].str.contains(('*nide'), case=False, regex=True)
df['test'].str.contains(('*gliptin'), case=False, regex=True)
I expect my output to be like as shown below
Upvotes: 0
Views: 612
Reputation: 18647
You could use numpy.select
, supplying a condlist
and choicelist
:
import numpy as np
condlist = [
df['test'].str.contains('etformi', case=False, regex=True),
df['test'].str.contains(('^gli.*ide$'), case=False, regex=True),
df['test'].str.contains(('nide$'), case=False, regex=True),
df['test'].str.contains(('gliptin$'), case=False, regex=True)
]
choicelist = ['C1', 'C2', 'C3', 'C4']
df['op'] = np.select(condlist, choicelist)
[out]
s_id test op
0 1 Metformin C1
1 2 Glipizide C2
2 3 Gliclazide C2
3 4 Glibenclamide C2
4 5 Repaglinide C3
5 6 nateglinide C3
6 7 sitagliptin C4
7 8 linagliptin C4
Upvotes: 3