Is there an alternative, more efficient way to unstack columns from a multiindex of a pandas dataframe?

Question

I have an object that I got from performing a groupby(["A", "B"] combined with .nlargest(3) function in pandas.

i.e:

df.groupby(["A", "B"])["Column"].nlargest(3).reset_index().unstack()

Now I have 3 values per "A" "B". I did an unstack and it works, but I hit the memory capacity and it crashes sometimes.

I somewhat remember finding a (built-in) solution to this very problem long ago, but couldn't find it again. Apologies if this is a duplicate and thanks in advance!

piterbarg · Accepted Answer

As far as I understand pivot_table should help after some initial prep

create the data:

import numpy as np
np.random.seed(2021)
df = pd.DataFrame({'A':np.random.randint(1,3,15), 'B':np.random.randint(1,3,15), 'C':np.random.normal(0,1,15)})
df

looks like this

    A   B   C
0   1   1   2.044890
1   2   1   1.075268
2   2   1   0.079020
3   1   1   0.493282
4   2   1   -0.791367
5   1   2   -2.130595
6   1   2   0.317206
7   1   2   -1.364617
8   2   2   0.358428
9   1   1   -1.305624
10  2   2   2.020718
11  2   1   -2.686804
12  2   2   0.557872
13  2   1   0.776176
14  1   1   0.202650

then we choose the 3 largest, groupby with cumcount to assign the rank, and pivot on the rank:

df2 = df.groupby(["A", "B"])["C"].nlargest(3).reset_index()
df2['rank'] = df2.groupby(["A", "B"]).cumcount()
pd.pivot_table(df2, values = 'C', index = ['A','B'], columns = 'rank')

this produces


 rank   0           1           2
A   B           
1   1   2.044890    0.493282    0.202650
    2   0.317206    -1.364617   -2.130595
2   1   1.075268    0.776176    0.079020
    2   2.020718    0.557872    0.358428

Please let me know if this is what you are after and if it works memory-wise

Is there an alternative, more efficient way to unstack columns from a multiindex of a pandas dataframe?

Answers (1)

Related Questions