Reputation: 2920
I have pandas dataframe in the following format:
d = {'item_code': ['A', 'B', 'C', 'A', 'A', 'B', 'B', 'A', 'C'], 'year': ['2010', '2010', '2010', '2010', '2010', '2011', '2011', '2011', '2011']}
df = pd.DataFrame(data=d)
df
This is how my dataframe looks like:
item_code year
0 A 2010
1 B 2010
2 C 2010
3 A 2010
4 A 2010
5 B 2011
6 B 2011
7 A 2011
8 C 2011
I have used groupby to list each year and its corresponding items.
df.groupby(['year', 'item_code']).size()
This is the result:
year item_code
2010 A 3
B 1
C 1
2011 A 1
B 2
C 1
dtype: int64
I want to get the top items in a year. For example for the year 2010 the top item is A. Similarly, for year 2011 the top item is B. How can I get that?
And lets say I want to get the top N items for each year. How can I do that too?
Upvotes: 3
Views: 250
Reputation: 862761
You can use value_counts
which sort each group by counts:
N = 2
df1 = df.groupby('year')['item_code'].apply(lambda x: x.value_counts().iloc[:N])
#alternative
#df1 = df.groupby('year')['item_code'].apply(lambda x: x.value_counts().head(N))
print (df1)
year
2010 A 3
B 1
2011 B 2
A 1
Name: item_code, dtype: int64
Another solution with groupby
+ head
:
N = 2
df1 = df.groupby(['year'])['item_code'].value_counts().groupby('year').head(N)
print (df1)
year item_code
2010 A 3
B 1
2011 B 2
A 1
Name: item_code, dtype: int64
Upvotes: 3
Reputation: 30605
Use dual groupby
i.e
df.groupby(['year', 'item_code']).size().sort_values(ascending=False).groupby(level=0).head(1)
year item_code
2010 A 3
2011 B 2
dtype: int64
Upvotes: 2