Reputation: 1126
I have an object that I got from performing a groupby(["A", "B"] combined with .nlargest(3) function in pandas.
i.e:
df.groupby(["A", "B"])["Column"].nlargest(3).reset_index().unstack()
Now I have 3 values per "A" "B". I did an unstack and it works, but I hit the memory capacity and it crashes sometimes.
I somewhat remember finding a (built-in) solution to this very problem long ago, but couldn't find it again. Apologies if this is a duplicate and thanks in advance!
Upvotes: 2
Views: 712
Reputation: 8219
As far as I understand pivot_table
should help after some initial prep
create the data:
import numpy as np
np.random.seed(2021)
df = pd.DataFrame({'A':np.random.randint(1,3,15), 'B':np.random.randint(1,3,15), 'C':np.random.normal(0,1,15)})
df
looks like this
A B C
0 1 1 2.044890
1 2 1 1.075268
2 2 1 0.079020
3 1 1 0.493282
4 2 1 -0.791367
5 1 2 -2.130595
6 1 2 0.317206
7 1 2 -1.364617
8 2 2 0.358428
9 1 1 -1.305624
10 2 2 2.020718
11 2 1 -2.686804
12 2 2 0.557872
13 2 1 0.776176
14 1 1 0.202650
then we choose the 3 largest, groupby
with cumcount
to assign the rank, and pivot on the rank:
df2 = df.groupby(["A", "B"])["C"].nlargest(3).reset_index()
df2['rank'] = df2.groupby(["A", "B"]).cumcount()
pd.pivot_table(df2, values = 'C', index = ['A','B'], columns = 'rank')
this produces
rank 0 1 2
A B
1 1 2.044890 0.493282 0.202650
2 0.317206 -1.364617 -2.130595
2 1 1.075268 0.776176 0.079020
2 2.020718 0.557872 0.358428
Please let me know if this is what you are after and if it works memory-wise
Upvotes: 3