Brussel
Brussel

Reputation: 27

Use keywords from dataframe to detect if any present in another dataframe or string

I have two problems: First is...

I have one dataframe with category and keywords like this:

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

Another dataframe like this:

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

I want the end result like this:

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

Second is...

I should be able to check if a string contains any of the keywords, and if true then output a list of appropriate categories.

Example: sample_sentence = "This line contains a red plum?"

output:

result_list = ['color','Fruit']

EDIT: Its kind of similar but not same.Use this for reference: How do I assign categories in a dataframe if they contain any element from another dataframe?

EDIT2:

I also have another version of first dataframe like this:

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

Upvotes: 1

Views: 535

Answers (1)

David Erickson
David Erickson

Reputation: 16683

You can use list comprehension to achieve this:

Dataframe set-up:

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

Code:

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Output:

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

a, b and x iterate through rows (vertically). c and y iterate through lists within rows (horizontally). In order to start iterating through lists horizontally, you first need to iterate through rows vertically. That is why we have all of these variables (see image). You can use zip to simultaneously iterate through two or more columns of the first dataframe.

enter image description here

Upvotes: 1

Related Questions