Yogi
Yogi

Reputation: 35

Python Compare unequal data frames with text in true/false to get column output

I have the following two data frames

df1

Animal         Categ_Class
--------------------------
Cat            Soft
Dog            Soft
Dinosaur       Hard

df2

Text                               Animal_Exist
-----------------------------------------------
The Cat is purring                  True
Cat drank the milk                  True
Lizard is crawling over the wall    False
The dinosaurs are extinct now       True

The column in df2 is derived from df1.Animal existing in df2.Text

I need help in understanding the code to write that I can get an output like this

Output

Text                               Animal_Exist   Categ_Class
--------------------------------------------------------------
The Cat is purring                  True          Soft
Cat drank the milk                  True          Soft
Lizard is crawling over the wall    False         NA
The dinosaurs are extinct now       True          Hard

I am new to python and have been trying this multiple ways since days. Any help is appreciated.

Regards.

Upvotes: 1

Views: 64

Answers (1)

jezrael
jezrael

Reputation: 863301

Use Series.str.extract for get values of Animal converted to lowercase and then use Series.map

import re

s = df1.assign(Animal = df1['Animal'].str.lower()).set_index('Animal')['Categ_Class']
pat = f'({"|".join(s.index)})'
cat = df2['Text'].str.extract(pat, expand=False, flags=re.I).str.lower().map(s)

df2 = df2.assign(Animal_Exist = cat.notna(), Categ_Class = cat)
print (df2)
                               Text  Animal_Exist Categ_Class
0                The Cat is purring          True        Soft
1                Cat drank the milk          True        Soft
2  Lizard is crawling over the wall         False         NaN
3     The dinosaurs are extinct now          True        Hard

Upvotes: 1

Related Questions