How to apply regex for multiple values to create a new column in pandas?

Question

I have a data frame like as shown below

import pandas as pd
import numpy as np
df=pd.DataFrame({'s_id':[1,2,3,4,5,6,7,8],
                'test':['Metformin','Glipizide','Gliclazide','Glibenclamide','Repaglinide','nateglinide','sitagliptin','linagliptin']})

I would like to create a new column called op based on values from the test column according to the criteria below

If the test value contains *etformi* pattern, then the op column should have a value C1.

If the test value contains the gli* pattern, then the op column should have a value C2

If the test value contains the *nide pattern, then the op column should have a value C3

If the test value contains the *gliptin pattern, then the op column should have a value C4

I tried the below but it doesn't work as expected and not sure how to merge them all together

df['test'].str.contains('metformi*', case=False, regex=True)
df['test'].str.contains(('gli*&&*ide'), case=False, regex=True)
df['test'].str.contains(('*nide'), case=False, regex=True)
df['test'].str.contains(('*gliptin'), case=False, regex=True)

I expect my output to be like as shown below

Chris Adams · Accepted Answer

You could use numpy.select, supplying a condlist and choicelist:

import numpy as np

condlist = [
    df['test'].str.contains('etformi', case=False, regex=True),
    df['test'].str.contains(('^gli.*ide$'), case=False, regex=True),
    df['test'].str.contains(('nide$'), case=False, regex=True),
    df['test'].str.contains(('gliptin$'), case=False, regex=True)
]

choicelist = ['C1', 'C2', 'C3', 'C4']

df['op'] = np.select(condlist, choicelist)

[out]

   s_id           test  op
0     1      Metformin  C1
1     2      Glipizide  C2
2     3     Gliclazide  C2
3     4  Glibenclamide  C2
4     5    Repaglinide  C3
5     6    nateglinide  C3
6     7    sitagliptin  C4
7     8    linagliptin  C4

How to apply regex for multiple values to create a new column in pandas?

Answers (1)

Related Questions