Reputation: 3432
I have a dataframe such as :
the_list =['LjHH','Lhy_kd','Ljk']
COL1 COL2
A ADJJDUD878_Lhy_kd
B Y0_0099JJ_Ljk
C YTUUDBBDHHD
D POL0990E_LjHH'
And I would like to add a new COL3
column where if within COL2
I have a match with a value in the_list
, I add in that column the matching element of the_list
.
Expected result;
COL1 COL2 COL3
A ADJJDUD878_Lhy_kd Lhy_kd
B Y0_0099JJ_2_Ljk Ljk
C YTUUDBBDHHD NA
D POL0990E_LjHH' LjHH
Upvotes: 0
Views: 1352
Reputation: 862611
For get only first matched values use Series.str.extract
with joined values of lists by |
for regex or
:
the_list =['LjHH','Lhy_kd','Ljk']
df['COL3'] = df['COL2'].str.extract(f'({"|".join(the_list)})', expand=False)
print (df)
COL1 COL2 COL3
0 A ADJJDUD878_Lhy_kd Lhy_kd
1 B Y0_0099JJ_Ljk Ljk
2 C YTUUDBBDHHD NaN
3 D POL0990E_LjHH' LjHH
For get all matched values (if possible multiple values) use Series.str.findall
with Series.str.join
and last repalce empty string to NaN
s:
the_list =['LjHH','Lhy_kd','Ljk']
df['COL3']=df['COL2'].str.findall(f'{"|".join(the_list)}').str.join(',').replace('',np.nan)
print (df)
COL1 COL2 COL3
0 A ADJJDUD878_Lhy_kd Lhy_kd
1 B Y0_0099JJ_Ljk Ljk
2 C YTUUDBBDHHD NaN
3 D POL0990E_LjHH' LjHH
Upvotes: 1