How to generate a contingency table with counts using pandas?

Question

Let's say I have data like this:

+-------+--------+--------------+--------+--------------+
| index | used_x | clicked_in_x | used_y | clicked_in_y |
+-------+--------+--------------+--------+--------------+
|     1 | True   | False        | True   | True         |
|     2 | False  | False        | True   | False        |
|     3 | True   | True         | False  | False        |
+-------+--------+--------------+--------+--------------+

And I would like to generate a contingency table using using pandas that shows a table like:

+--------+----------------+----------------+
|        | clicked_from_x | clicked_from_y |
+--------+----------------+----------------+
| used_x |             40 |              3 |
| used_y |              2 |             10 |
+--------+----------------+----------------+

What would be the best way to achieve this? So far I've tried the following using the crosstab method:

import numpy as np
import pandas as pd

size = 20

df = pd.DataFrame({
    'used_x': np.random.choice(a=[False, True], size=size),
    'clicked_from_x': np.random.choice(a=[False, True], size=size),
    'used_y':  np.random.choice(a=[False, True], size=size),
    'clicked_from_y':  np.random.choice(a=[False, True], size=size),
})

pd.crosstab([df['used_x'], df['used_y']], [df['clicked_from_x'], df['clicked_from_y']],  margins=False)

Which yields:

But this is pretty hard to make sense of and not the representation I'd hoped for. Does anyone know how to get my desired outcome, or perhaps an equivalent strategy using pandas?

cs95 · Accepted Answer

We'll be using the almighty dot product subroutine here.

i = df.filter(like='clicked')
j = df.filter(like='used')

j.astype(int).T.dot(i)

        clicked_from_x  clicked_from_y
used_x               6               5
used_y               6               6

How to generate a contingency table with counts using pandas?

Answers (1)

Related Questions