aks2200
aks2200

Reputation: 73

Keywords search in text column of data frame using dictionary

I am new to python and their is very specific requirement on which I got stuck due to limited knowledge, I will appreciate if someone can help with this

I have generated a dictionary using excel which look like this

dict = {'Fruit' : {'Comb Words' : ['yellow',
                                   'elongated',
                                   'cooking'],
                   'Mandatory Word' : ['banana',
                                       'banana',
                                       'banana']},
       'Animal' : {'Comb Words' : ['mammal',
                                   'white'
                                   'domestic'],
                  'Mandatory Word' : ['cat',
                                      'cat',
                                      'cat']}}

Now, I have a dataframe which has a text column and I want to match keywords from this dictionary with that column. For example:

            Text                     Mandatory      Comb            Final
A white domestic cat is playing        cat       domestic,white     Animal
yellow banana is not available        banana       yellow           Fruit

This dictionary is just an idea, I can change it since it is an input from excel. So any other format or way which can result in above output is the only aim here.

Upvotes: 0

Views: 218

Answers (1)

Rishabh Kumar
Rishabh Kumar

Reputation: 2430

Using user-defined function:

import pandas as pd

Dict = {'Fruit' : {'Comb Words' : ['yellow',
                                   'elongated',
                                   'cooking'],
                   'Mandatory Word' : ['banana',
                                       'banana',
                                       'banana']},
       'Animal' : {'Comb Words' : ['mammal',
                                   'white',
                                   'domestic'],
                  'Mandatory Word' : ['cat',
                                      'cat',
                                      'cat']}}
                                      
df = pd.DataFrame({'Text':['A white domestic cat is playing',
                            'yellow banana is not available']})

def findMCF(sentence):
    for mand in sentence.split():
        for final in Dict:
            wordtypeDict = Dict[final]
            mandList = wordtypeDict['Mandatory Word']
            if mand in mandList:
                C = [wrd for wrd in sentence.split() if word in wordtypeDict['Comb Words']]
                return (mand,','.join(C),final)

df['Mandatory'],df['Comb'],df['Final'] = zip(*df['Text'].map(findMCF))

print(df)

Output:

                              Text Mandatory            Comb   Final
0  A white domestic cat is playing       cat  white,domestic  Animal
1   yellow banana is not available    banana          yellow   Fruit

Upvotes: 1

Related Questions