Reputation: 1383
I have made a crosstab
on 3 variables (position
, offer
, group
). How can I calculate the percentage by the total of 1 variable offer
, but not the margins (i.e. normalize by column)?
df = pd.crosstab(df.group, [df.position, df.offer], margins = True)
df
pid offer position group
1 accept left group1
1 accept left group1
1 accept right group2
1 reject right group2
1 reject right group1
2 reject right group1
2 reject left group2
2 accept left group3
3 accept right group3
3 reject right group1
3 reject right group2
my current crosstab:
position left right All
offer accept reject accept reject
group1 2 0 0 3 5
group2 0 1 1 2 4
group3 1 0 1 0 2
All 3 1 2 5 11
expected outcome:
position left right
offer accept reject accept reject
group1 1 0 0 1
group2 0 1 0.33 0.66
group3 1 0 1 0
Thanks!
Upvotes: 0
Views: 1028
Reputation: 76927
You could use
In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)
In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer accept reject
position left right left right
group
group1 1.0 0.000000 0.0 1.000000
group2 0.0 0.333333 1.0 0.666667
group3 1.0 1.000000 0.0 0.000000
You could also have dfa
from pivot_table
as well.
df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']
Upvotes: 0
Reputation: 402553
Add another step, groupby
the 0th level along columns and divide c
by the sum
.
c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)
position left right
offer accept reject accept reject
group
group1 1.0 0.0 0.000000 1.000000
group2 0.0 1.0 0.333333 0.666667
group3 1.0 0.0 1.000000 0.000000
If you're as much of a perfectionist as I am, you might want the whole numbers as integers, you can do that like this:
df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)
position left right
offer accept reject accept reject
group
group1 1 0 0 1
group2 0 1 0.333333 0.666667
group3 1 0 1 0
Upvotes: 1