chippycentra
chippycentra

Reputation: 3432

Add a new column with matching values in a list in pandas

I have a dataframe such as :

the_list =['LjHH','Lhy_kd','Ljk']

COL1 COL2 
A    ADJJDUD878_Lhy_kd
B    Y0_0099JJ_Ljk
C    YTUUDBBDHHD
D    POL0990E_LjHH'

And I would like to add a new COL3 column where if within COL2 I have a match with a value in the_list, I add in that column the matching element of the_list.

Expected result;

COL1 COL2               COL3
A    ADJJDUD878_Lhy_kd  Lhy_kd
B    Y0_0099JJ_2_Ljk    Ljk    
C    YTUUDBBDHHD        NA
D    POL0990E_LjHH'     LjHH

Upvotes: 0

Views: 1352

Answers (1)

jezrael
jezrael

Reputation: 862611

For get only first matched values use Series.str.extract with joined values of lists by | for regex or:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3'] = df['COL2'].str.extract(f'({"|".join(the_list)})', expand=False)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH

For get all matched values (if possible multiple values) use Series.str.findall with Series.str.join and last repalce empty string to NaNs:

the_list =['LjHH','Lhy_kd','Ljk']

df['COL3']=df['COL2'].str.findall(f'{"|".join(the_list)}').str.join(',').replace('',np.nan)
print (df)
  COL1               COL2    COL3
0    A  ADJJDUD878_Lhy_kd  Lhy_kd
1    B      Y0_0099JJ_Ljk     Ljk
2    C        YTUUDBBDHHD     NaN
3    D     POL0990E_LjHH'    LjHH

Upvotes: 1

Related Questions