Reputation: 11002
Let's assume we have a table like:
id chr val1 val2
... A 2 10
... B 4 20
... A 3 30
...and we'd like to have a contingency table like this (grouped by chr
, thus using 'A'
and 'B'
as the row indices and then summing up the values for val1
and val2
):
val1 val2 total
A 5 40 45
B 4 20 24
total 9 60 69
How can we achieve this?
pd.crosstab(index=df.chr, columns=["val1", "val2"])
looked quite promising but it just counts the rows and does not sum up the values.
I have also tried (numerous times) to supply the values manually...
pd.crosstab(
index=df.chr.unique(),
columns=["val1", "val2"],
values=[
df.groupby("chr")["val1"],
df.groupby("chr")["val2"]
],
aggfunc=sum
)
...but this always ends up in shape mismatches and when I tried to reshape via NumPy:
values=np.array([
df.groupby("chr")["val1"].values,
df.groupby("chr")["val2"].values
].reshape(-1, 2)
...crosstab
tells me that it expected 1 value instead of the two given for each row.
Upvotes: 2
Views: 4524
Reputation: 2696
What you want is pivot_table
table = pd.pivot_table(df, values=['val1','val2'], index=['char'], aggfunc=np.sum)
table['total'] = table['val1'] + table['val2']
Upvotes: 1
Reputation: 16147
import pandas as pd
df = pd.DataFrame({'chr': {0: 'A', 1: 'B', 2: 'A'},
'val1': {0: 2, 1: 4, 2: 3},
'val2': {0: 10, 1: 20, 2: 30}})
# aggregate values by chr
df = df.groupby('chr').sum().reset_index()
df = df.set_index('chr')
# Column Total
df.loc['total', :] = df.sum()
# Row total
df['total'] = df.sum(axis=1)
Output
val1 val2 total
chr
A 5.0 40.0 45.0
B 4.0 20.0 24.0
total 9.0 60.0 69.0
Upvotes: 3