Reputation: 47
I've been trying to find out the top-3 highest frequency restaurant names under each type of restaurant
The columns are:
rest_type
- Column for the type of restaurant
name
- Column for the name of the restaurant
url
- Column used for counting occurrences
This was the code that ended up working for me after some searching:
df_1=df.groupby(['rest_type','name']).agg('count')
datas=df_1.groupby(['rest_type'], as_index=False).apply(lambda x : x.sort_values(by="url",ascending=False).head(3))
['url'].reset_index().rename(columns={'url':'count'})
The final output was as follows:
I had a few questions pertaining to the above code:
How are we able to groupby using rest_type
again for datas variable after grouping it earlier. Should it not give the missing column error? The second groupby operation is a bit confusing to me.
What does the first formulated column level_0
signify? I tried the code with as_index=True
and it created an index and column pertaining to rest_type
so I couldn't reset the index. Output below:
Thank you
Upvotes: 2
Views: 653
Reputation: 81
I have a similar case where the above query looks working partially. In my case the cooccurrence value is coming as 1 always.
Here in my input data frame.
And my query is below
top_five_family_cooccurence_df = (common_top25_cooccurance1_df.groupby('family') .apply(lambda x: x['related_family'].value_counts().nlargest(5)) .reset_index().rename(columns={'related_family': 'cooccurence', 'level_1': 'related_family'}) )
Where as The cooccurrence is always giving me 1.
Upvotes: 0
Reputation: 260640
You can use groupby
a second time as it is present in the index which is recognized by groupby.
level_0
comes from the reset_index
command because you index is unnamed.
That said, and provided I understand your dataset, I feel that you could achieve your goal more easily:
import random
df = pd.DataFrame({'rest_type': random.choices('ABCDEF', k=20),
'name': random.choices('abcdef', k=20),
'url': range(20), # looks like this is a unique identifier
})
def tops(s, n=3):
return s.value_counts().sort_values(ascending=False).head(n)
df.groupby('rest_type')['name'].apply(tops, n=3)
edit: here is an alternative to format the result as a dataframe with informative column names
(df.groupby('rest_type')
.apply(lambda x: x['name'].value_counts().nlargest(3))
.reset_index().rename(columns={'name': 'counts', 'level_1': 'name'})
)
Upvotes: 2