Reputation: 93
I have this dataset:
The cuisine countries in it keep reoccurring and what I like to have as output is the list of let's say 5 food ingredients that are the most popular for every country.
The code until now:
import pandas as pd
from collections import Counter
filename="food.json"
food_dataset = pd.read_json(filename)
#getting seperate columns
country = food_dataset.loc[:,"country"]
ingredients = food_dataset.loc[:,"ingredients"]
Counter = Counter(ingredients)
most_occur = Counter.most_common(3)
print(most_occur)
Upvotes: 1
Views: 372
Reputation: 862661
Solution for pandas 0.25+ DataFrame.explode
with GroupBy.apply
and lambd function with first 5 index by created by counter by Series.value_counts
:
food_dataset = pd.DataFrame({'cuisine':['greek','southern_us'],
'ingredients':[list('andnsndnfndn'),
list('ndnsndnfnsnd')]})
print (food_dataset)
cuisine ingredients
0 greek [a, n, d, n, s, n, d, n, f, n, d, n]
1 southern_us [n, d, n, s, n, d, n, f, n, s, n, d]
N = 3
df = (food_dataset.explode("ingredients")
.groupby('cuisine')['ingredients']
.apply(lambda x: x.value_counts().index[:N].tolist())
.reset_index())
print (df)
cuisine ingredients
0 greek [n, d, a]
1 southern_us [n, d, s]
Alternative solution:
food_dataset['top'] = (food_dataset['ingredients']
.apply(lambda x: [y[0] for y in Counter(x).most_common(N)]))
print (food_dataset)
cuisine ingredients top
0 greek [a, n, d, n, s, n, d, n, f, n, d, n] [n, d, a]
1 southern_us [n, d, n, s, n, d, n, f, n, s, n, d] [n, d, s]
df = (food_dataset.explode("ingredients")
.groupby('cuisine')['ingredients']
.apply(lambda x: [y[0] for y in Counter(x).most_common(N)])
.reset_index())
print (df)
cuisine ingredients
0 greek [n, d, a]
1 southern_us [n, d, s]
Solution if each values in cousine
column are unique:
food_dataset['top'] = (food_dataset['ingredients']
.apply(lambda x: [y[0] for y in Counter(x).most_common(N)]))
print (food_dataset)
cuisine ingredients top
0 greek [a, n, d, n, s, n, d, n, f, n, d, n] [n, d, a]
1 southern_us [n, d, n, s, n, d, n, f, n, s, n, d] [n, d, s]
Upvotes: 1