Paradigm
Paradigm

Reputation: 149

Pandas: Search and return data frame that includes specific words in a column

I have a dataframe of something like the following structure:

NDB_No  Shrt_Desc   Water_(g)   Energ_Kcal  Protein_(g) ...   
01001   BUTTER,WITH SALT    15,87   717 0,85  
01002   BUTTER,WHIPPED,W/ SALT  16,72   718 0,49  
...  
01004   CHEESE,BLUE 42,41   353 21,4    28,74  
01005   CHEESE,BRICK    41,11   371 23,24   29,68 

I want to get a dataframe that includes only the rows where in the Shrt_Desc column has items that are in the list to_be_found = [BUTTER, PASTA, ..etc] but not CHEESE
The word to be found (in the list above) could be anywhere in the Shrt_Desc, not necessarily in the beginning, like SALT above.

How should I approach this?
Thanks!

Upvotes: 0

Views: 390

Answers (2)

Paradigm
Paradigm

Reputation: 149

The following piece of code resolves the issue (based on @piRSquared hint above).

import pandas as pd
from collections import Counter

food_info = pd.read_excel("ABBREV.xlsx")
dfi_1 = food_info


to_be_found = ['BUTTER', 'CHEESE', 'MILK', 'OIL', 'CORN', 'SALT', 'INF', 'PEPPER', 'PASTA', 'GLUTEN-FREE']
found = []
dfi_6 =  dfi_1.Shrt_Desc.str.split(',')
for row in dfi_6.iteritems():
    for x in to_be_found:
        if x in row[1]:
            found.append(x)

print(found)
print(len(found))

c = Counter(found)
print(c)

Upvotes: 0

piRSquared
piRSquared

Reputation: 294536

I would split the 'Shrt_Desc' column by comma then see if 'CHEESE' is in each split list. Use this a mask on the DataFrame

to_be_found = set(('SALT', 'BUTTER'))

mask = df.Shrt_Desc.str.split(',').apply(lambda x: to_be_found in x)

df[mask]

Looks like:

  NDB_No                 Shrt_Desc
0  01001          BUTTER,WITH SALT
1  01002  BUTTER,WHIPPED,WITH SALT

Upvotes: 1

Related Questions