Reputation: 5552
Given a dataframe that looks like this:
A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
Is there a pythonic way to create a 2x2 matrix like this:
1 0
1 a b
0 c d
Where:
a = number of obs where the corresponding elements of column A and B are both positive.
b = number of obs where the corresponding elements of column A are positive and negative in column B.
c = number of obs where the corresponding elements of column A are negative and positive in column B.
d = number of obs where the corresponding elements of column A and B are both negative.
For this example the output would be:
1 0
1 2 3
0 3 1
Thanks
Upvotes: 24
Views: 31372
Reputation: 587
Here's a really useful page about the pandas crosstab function:
https://chrisalbon.com/python/data_wrangling/pandas_crosstabs/
So I think for what you'd like to do you should use
import pandas as pd
pd.crosstab(data['A']>0, data['B']>0)
Hope that helps!
Upvotes: 10
Reputation: 1412
Probably easiest to just use the pandas function crosstab
. Borrowing from Dyno Fu above:
import pandas as pd
from StringIO import StringIO
table = """dt A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)
pd.crosstab(df.A > 0, df.B > 0)
Output:
B False True
A
False 1 3
True 3 2
[2 rows x 2 columns]
Also the table is usable if you want to do a Fisher exact test with scipy.stats
etc:
from scipy.stats import fisher_exact
tab = pd.crosstab(df.A > 0, df.B > 0)
fisher_exact(tab)
Upvotes: 39
Reputation: 9044
import pandas as pd
from StringIO import StringIO
table = """dt A B
2005-09-06 5 -2
2005-09-07 -1 3
2005-09-08 4 5
2005-09-09 -8 2
2005-09-10 -2 -5
2005-09-11 -7 9
2005-09-12 2 8
2005-09-13 6 -5
2005-09-14 6 -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)
a = df['A'] > 0
b = df['B'] > 0
df1 = df.groupby([a,b]).count()
print df1["A"].unstack()
output:
B False True
A
False 1 3
True 3 2
this is just lnanenok's answer and using unstack()
to make it more readable. credit should go to lanenok.
Upvotes: 6
Reputation: 2749
Let us call your dataframe data
. Try
a = data['A']>0
b = data['B']>0
data.groupby([a,b]).count()
Upvotes: 22