How to get pandas crosstab to sum up values for multiple columns?

Question

Let's assume we have a table like:

id    chr     val1     val2
...    A        2       10
...    B        4       20
...    A        3       30

...and we'd like to have a contingency table like this (grouped by chr, thus using 'A' and 'B' as the row indices and then summing up the values for val1 and val2):

       val1    val2    total
A        5      40       45
B        4      20       24
total    9      60       69

How can we achieve this?

pd.crosstab(index=df.chr, columns=["val1", "val2"]) looked quite promising but it just counts the rows and does not sum up the values.

I have also tried (numerous times) to supply the values manually...

pd.crosstab(
  index=df.chr.unique(),
  columns=["val1", "val2"],
  values=[
    df.groupby("chr")["val1"],
    df.groupby("chr")["val2"]
  ],
  aggfunc=sum
)

...but this always ends up in shape mismatches and when I tried to reshape via NumPy:

values=np.array([
  df.groupby("chr")["val1"].values,
  df.groupby("chr")["val2"].values
].reshape(-1, 2)

...crosstab tells me that it expected 1 value instead of the two given for each row.

Chris · Accepted Answer

import pandas as pd

df = pd.DataFrame({'chr': {0: 'A', 1: 'B', 2: 'A'},
 'val1': {0: 2, 1: 4, 2: 3},
 'val2': {0: 10, 1: 20, 2: 30}})
# aggregate values by chr
df = df.groupby('chr').sum().reset_index()
df = df.set_index('chr')
# Column Total
df.loc['total', :] = df.sum()
# Row total
df['total'] = df.sum(axis=1)

Output

       val1     val2    total
chr         
A       5.0     40.0    45.0
B       4.0     20.0    24.0
total   9.0     60.0    69.0

How to get pandas crosstab to sum up values for multiple columns?

Answers (2)

Related Questions