Reputation: 5510
Let's say I have data like this:
+-------+--------+--------------+--------+--------------+
| index | used_x | clicked_in_x | used_y | clicked_in_y |
+-------+--------+--------------+--------+--------------+
| 1 | True | False | True | True |
| 2 | False | False | True | False |
| 3 | True | True | False | False |
+-------+--------+--------------+--------+--------------+
And I would like to generate a contingency table using using pandas that shows a table like:
+--------+----------------+----------------+
| | clicked_from_x | clicked_from_y |
+--------+----------------+----------------+
| used_x | 40 | 3 |
| used_y | 2 | 10 |
+--------+----------------+----------------+
What would be the best way to achieve this? So far I've tried the following using the crosstab
method:
import numpy as np
import pandas as pd
size = 20
df = pd.DataFrame({
'used_x': np.random.choice(a=[False, True], size=size),
'clicked_from_x': np.random.choice(a=[False, True], size=size),
'used_y': np.random.choice(a=[False, True], size=size),
'clicked_from_y': np.random.choice(a=[False, True], size=size),
})
pd.crosstab([df['used_x'], df['used_y']], [df['clicked_from_x'], df['clicked_from_y']], margins=False)
Which yields:
But this is pretty hard to make sense of and not the representation I'd hoped for. Does anyone know how to get my desired outcome, or perhaps an equivalent strategy using pandas?
Upvotes: 1
Views: 580
Reputation: 402553
We'll be using the almighty dot
product subroutine here.
i = df.filter(like='clicked')
j = df.filter(like='used')
j.astype(int).T.dot(i)
clicked_from_x clicked_from_y
used_x 6 5
used_y 6 6
Upvotes: 1