Alex T
Alex T

Reputation: 3754

Can I use crosstab to get a pivot table for summation?

I'm using crosstab to sum the sales in given areas by the Publisher. The original dataframe looks like this:

Publisher   NA_Sales    EU_Sales    JP_Sales
1   Nintendo    29.08   3.58    6.81
2   Nintendo    15.68   12.76   3.79
3   Nintendo    15.61   10.93   3.28
4   Nintendo    11.27   8.89    10.22
5   Nintendo    23.20   2.26    4.22

I did it with pivot table now I want to do it using crosstab.

salespivot1=pd.pivot_table(df, index=df.Publisher,
    aggfunc=np.sum).sort_values('NA_Sales', ascending=False)

creates:

          EU_Sales  JP_Sales    NA_Sales
Publisher           
Nintendo    390.05  454.38  775.61
Electronic Arts 373.91  14.35   599.50
Activision  215.90  6.71    432.59
Sony Computer Entertainment 186.56  74.15   266.17
Ubisoft 161.99  7.52    252.74

But using crosstab I cant recreate this dataframe, because it stacks EU_Sales on top of the NA_Sales no matter what I do

salespivot3=pd.crosstab(index=df.Publisher, columns=['NA_Sales', 'EU_Sales'],
    values=df.NA_Sales, aggfunc=sum)

creates:

col_0   NA_Sales
col_1   EU_Sales
Nintendo    775.61
Electronic Arts  599.50
Activision    432.59
Sony Computer Entertainment  266.17
Ubisoft    252.74

How can I recreate the dataframe with crosstab to give same results as pivot?

Upvotes: 2

Views: 10606

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

I think your are "misusing" pivot_table and crosstab methods.

Pivot methods suppose to transform your data set from long format to wide format.

Here is a small demo:

Source DF:

In [42]: df
Out[42]:
     A    B      C  D
0  foo  one  small  1
1  foo  one  large  2
2  foo  one  large  2
3  foo  two  small  3
4  foo  two  small  3
5  bar  one  large  4
6  bar  one  small  5
7  bar  two  small  6
8  bar  two  large  7

pivot_table usage:

In [43]: df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc='sum')
Out[43]:
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

now the same result using pd.crosstab() method:

In [44]: pd.crosstab(index=[df.A,df.B], columns=df.C, values=df.D, aggfunc='sum')
Out[44]:
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

for your sample data set you simply want to groupby + sum:

In [46]: df
Out[46]:
  Publisher  NA_Sales  EU_Sales  JP_Sales
1  Nintendo     29.08      3.58      6.81
2  Nintendo     15.68     12.76      3.79
3  Nintendo     15.61     10.93      3.28
4  Nintendo     11.27      8.89     10.22
5  Nintendo     23.20      2.26      4.22

In [47]: df.groupby('Publisher', as_index=False).sum()
Out[47]:
  Publisher  NA_Sales  EU_Sales  JP_Sales
0  Nintendo     94.84     38.42     28.32

UPDATE: if you by all means want to generate the same data set using crosstab method you can do it this way:

In [63]: x = df.set_index('Publisher').stack().reset_index(name='val')

In [64]: x
Out[64]:
   Publisher   level_1    val
0   Nintendo  NA_Sales  29.08
1   Nintendo  EU_Sales   3.58
2   Nintendo  JP_Sales   6.81
3   Nintendo  NA_Sales  15.68
4   Nintendo  EU_Sales  12.76
5   Nintendo  JP_Sales   3.79
6   Nintendo  NA_Sales  15.61
7   Nintendo  EU_Sales  10.93
8   Nintendo  JP_Sales   3.28
9   Nintendo  NA_Sales  11.27
10  Nintendo  EU_Sales   8.89
11  Nintendo  JP_Sales  10.22
12  Nintendo  NA_Sales  23.20
13  Nintendo  EU_Sales   2.26
14  Nintendo  JP_Sales   4.22

In [65]: pd.crosstab(index=x.Publisher, columns=x.level_1, values=x.val, aggfunc='sum')
Out[65]:
level_1    EU_Sales  JP_Sales  NA_Sales
Publisher
Nintendo      38.42     28.32     94.84

NOTE: we had first to convert your original DF from wide to long format and then back to wide format using pd.crosstab

Upvotes: 3

Nickil Maveli
Nickil Maveli

Reputation: 29711

It's not possible to use pd.crosstab() directly on your current DF unless you reshape them from wide to a long format so that the resulting headers would later serve as subsequent parameters to be passed into it's function call.

Here's a slight hack:

idx = ["Publisher"]
d = pd.melt(df, id_vars=idx)
pd.crosstab(d.Publisher, d.variable, d.value, aggfunc="sum", rownames=idx, colnames=[None])

enter image description here

But honestly, you should be using either a groupby/pivot_table approach which is designed for this exact purpose.

Upvotes: 8

Related Questions