Can I use crosstab to get a pivot table for summation?

Question

I'm using crosstab to sum the sales in given areas by the Publisher. The original dataframe looks like this:

Publisher   NA_Sales    EU_Sales    JP_Sales
1   Nintendo    29.08   3.58    6.81
2   Nintendo    15.68   12.76   3.79
3   Nintendo    15.61   10.93   3.28
4   Nintendo    11.27   8.89    10.22
5   Nintendo    23.20   2.26    4.22

I did it with pivot table now I want to do it using crosstab.

salespivot1=pd.pivot_table(df, index=df.Publisher,
    aggfunc=np.sum).sort_values('NA_Sales', ascending=False)

creates:

          EU_Sales  JP_Sales    NA_Sales
Publisher           
Nintendo    390.05  454.38  775.61
Electronic Arts 373.91  14.35   599.50
Activision  215.90  6.71    432.59
Sony Computer Entertainment 186.56  74.15   266.17
Ubisoft 161.99  7.52    252.74

But using crosstab I cant recreate this dataframe, because it stacks EU_Sales on top of the NA_Sales no matter what I do

salespivot3=pd.crosstab(index=df.Publisher, columns=['NA_Sales', 'EU_Sales'],
    values=df.NA_Sales, aggfunc=sum)

creates:

col_0   NA_Sales
col_1   EU_Sales
Nintendo    775.61
Electronic Arts  599.50
Activision    432.59
Sony Computer Entertainment  266.17
Ubisoft    252.74

How can I recreate the dataframe with crosstab to give same results as pivot?

Nickil Maveli · Accepted Answer

It's not possible to use pd.crosstab() directly on your current DF unless you reshape them from wide to a long format so that the resulting headers would later serve as subsequent parameters to be passed into it's function call.

Here's a slight hack:

idx = ["Publisher"]
d = pd.melt(df, id_vars=idx)
pd.crosstab(d.Publisher, d.variable, d.value, aggfunc="sum", rownames=idx, colnames=[None])

But honestly, you should be using either a groupby/pivot_table approach which is designed for this exact purpose.

Can I use crosstab to get a pivot table for summation?

Answers (2)

Related Questions