No94
No94

Reputation: 165

get top n rows per group pandas

I tried to use groupby with pandas but being fairly new to python i can't seem to find a solution

raw_data = {'Products': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 
        'Month': ['201903', '201903', '201902', '201901', '201902', '201901', '201902', '201904','201903', '201902', '201904', '201903'], 
        'Sales': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3]}
df = pd.DataFrame(raw_data, columns = ['Products', 'Month', 'Sales'])
df

data looks like so

Products    Month   Sales
0   A           201903  4
1   A           201903  24
2   A           201902  31
3   A           201901  2
4   B           201902  3
5   B           201901  4
6   B           201902  24
7   C           201904  31
8   C           201903  2
9   C           201902  3
10  C           201904  2
11  C           201903  3

and i need , per Product to display latest two months with sum of sales in said months like so

Products    Months  Sales
A           201902  31
A           201903  28
B           201901  4
B           201902  27
C           201903  5
C           201904  33

i'm sorry if everything is not formatted correctly, still new to SO

Thank you

Upvotes: 2

Views: 593

Answers (1)

zipa
zipa

Reputation: 27869

This will do it:

(df.groupby(['Products', 'Month'], as_index=False)
   .sum()
   .sort_values(['Products', 'Sales'],
                ascending=(True,False))
   .groupby('Products')
   .head(2))

  Products   Month  Sales
1        A  201902     31
2        A  201903     28
4        B  201902     27
3        B  201901      4
7        C  201904     33
6        C  201903      5

Upvotes: 2

Related Questions