fraxture
fraxture

Reputation: 5510

How to generate a contingency table with counts using pandas?

Let's say I have data like this:

+-------+--------+--------------+--------+--------------+
| index | used_x | clicked_in_x | used_y | clicked_in_y |
+-------+--------+--------------+--------+--------------+
|     1 | True   | False        | True   | True         |
|     2 | False  | False        | True   | False        |
|     3 | True   | True         | False  | False        |
+-------+--------+--------------+--------+--------------+

And I would like to generate a contingency table using using pandas that shows a table like:

+--------+----------------+----------------+
|        | clicked_from_x | clicked_from_y |
+--------+----------------+----------------+
| used_x |             40 |              3 |
| used_y |              2 |             10 |
+--------+----------------+----------------+

What would be the best way to achieve this? So far I've tried the following using the crosstab method:

import numpy as np
import pandas as pd

size = 20

df = pd.DataFrame({
    'used_x': np.random.choice(a=[False, True], size=size),
    'clicked_from_x': np.random.choice(a=[False, True], size=size),
    'used_y':  np.random.choice(a=[False, True], size=size),
    'clicked_from_y':  np.random.choice(a=[False, True], size=size),
})

pd.crosstab([df['used_x'], df['used_y']], [df['clicked_from_x'], df['clicked_from_y']],  margins=False)

Which yields:

enter image description here

But this is pretty hard to make sense of and not the representation I'd hoped for. Does anyone know how to get my desired outcome, or perhaps an equivalent strategy using pandas?

Upvotes: 1

Views: 580

Answers (1)

cs95
cs95

Reputation: 402553

We'll be using the almighty dot product subroutine here.

i = df.filter(like='clicked')
j = df.filter(like='used')

j.astype(int).T.dot(i)

        clicked_from_x  clicked_from_y
used_x               6               5
used_y               6               6

Upvotes: 1

Related Questions