iboboboru
iboboboru

Reputation: 1102

Create contingency table Pandas with counts and percentages

Is there a better way to create a contingency table in pandas with pd.crosstab() or pd.pivot_table() to generate counts and percentages.

Current solution

cat=['A','B','B','A','B','B','A','A','B','B']
target = [True,False,False,False,True,True,False,True,True,True]

import pandas as pd
df=pd.DataFrame({'cat' :cat,'target':target})

using crosstab

totals=pd.crosstab(df['cat'],df['target'],margins=True).reset_index()
percentages = pd.crosstab(df['cat'],
   df['target']).apply(lambda row: row/row.sum(),axis=1).reset_index()

and a merge

summaryTable=pd.merge(totals,percentages,on="cat")
summaryTable.columns=['cat','#False',
    '#True','All','percentTrue','percentFalse']

output

+---+-----+--------+-------+-----+-------------+--------------+
|   | cat | #False | #True | All | percentTrue | percentFalse |
+---+-----+--------+-------+-----+-------------+--------------+
| 0 | A   |      2 |     2 |   4 | 0.500000    | 0.500000     |
| 1 | B   |      2 |     4 |   6 | 0.333333    | 0.666667     |
+---+-----+--------+-------+-----+-------------+--------------+

Upvotes: 8

Views: 5258

Answers (2)

James
James

Reputation: 673

I don't have a better answer than the merge, but you could replace

percentages = pd.crosstab(df['cat'],
   df['target']).apply(lambda row: row/row.sum(),axis=1).reset_index()

with

percentages = pd.crosstab(df['cat'],df['target']), normalize='index').reset_index()

for a slightly cleaner and maybe easier to read code.

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

you can do the following:

In [131]: s = df.groupby('cat').agg({'target': ['sum', 'count']}).reset_index(level=0)

In [132]: s.columns
Out[132]:
MultiIndex(levels=[['target', 'cat'], ['sum', 'count', '']],
           labels=[[1, 0, 0], [2, 0, 1]])

Let's bring order to column names:

In [133]: s.columns = [col[1] if col[1] else col[0] for col in s.columns.tolist()]

In [134]: s
Out[134]:
  cat  sum  count
0   A  2.0      4
1   B  4.0      6

In [135]: s['pctTrue'] = s['sum']/s['count']

In [136]: s['pctFalse'] = 1 - s.pctTrue

In [137]: s
Out[137]:
  cat  sum  count   pctTrue  pctFalse
0   A  2.0      4  0.500000  0.500000
1   B  4.0      6  0.666667  0.333333

Upvotes: 1

Related Questions