Reputation: 149
I have a dataframe of something like the following structure:
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) ...
01001 BUTTER,WITH SALT 15,87 717 0,85
01002 BUTTER,WHIPPED,W/ SALT 16,72 718 0,49
...
01004 CHEESE,BLUE 42,41 353 21,4 28,74
01005 CHEESE,BRICK 41,11 371 23,24 29,68
I want to get a dataframe that includes only the rows where in the Shrt_Desc
column has items that are in the list to_be_found = [BUTTER, PASTA, ..etc]
but not CHEESE
The word to be found (in the list above) could be anywhere in the Shrt_Desc
, not necessarily in the beginning, like SALT
above.
How should I approach this?
Thanks!
Upvotes: 0
Views: 390
Reputation: 149
The following piece of code resolves the issue (based on @piRSquared hint above).
import pandas as pd
from collections import Counter
food_info = pd.read_excel("ABBREV.xlsx")
dfi_1 = food_info
to_be_found = ['BUTTER', 'CHEESE', 'MILK', 'OIL', 'CORN', 'SALT', 'INF', 'PEPPER', 'PASTA', 'GLUTEN-FREE']
found = []
dfi_6 = dfi_1.Shrt_Desc.str.split(',')
for row in dfi_6.iteritems():
for x in to_be_found:
if x in row[1]:
found.append(x)
print(found)
print(len(found))
c = Counter(found)
print(c)
Upvotes: 0
Reputation: 294536
I would split
the 'Shrt_Desc'
column by comma then see if 'CHEESE' is in each split list. Use this a mask on the DataFrame
to_be_found = set(('SALT', 'BUTTER'))
mask = df.Shrt_Desc.str.split(',').apply(lambda x: to_be_found in x)
df[mask]
Looks like:
NDB_No Shrt_Desc
0 01001 BUTTER,WITH SALT
1 01002 BUTTER,WHIPPED,WITH SALT
Upvotes: 1