Lumos
Lumos

Reputation: 1383

calculate percentage in high dimensional crosstab

I have made a crosstab on 3 variables (position, offer, group). How can I calculate the percentage by the total of 1 variable offer, but not the margins (i.e. normalize by column)?

df = pd.crosstab(df.group, [df.position, df.offer], margins = True)

df

pid offer   position    group
1   accept  left        group1
1   accept  left        group1
1   accept  right       group2
1   reject  right       group2
1   reject  right       group1
2   reject  right       group1
2   reject  left        group2
2   accept  left        group3
3   accept  right       group3
3   reject  right       group1
3   reject  right       group2

my current crosstab:

 position         left                 right          All
 offer          accept   reject    accept   reject        
 group1         2         0           0       3       5
 group2         0         1           1       2       4
 group3         1         0           1       0       2
 All            3         1           2       5       11

expected outcome:

 position         left                 right
 offer          accept   reject    accept   reject       
 group1            1       0         0        1 
 group2            0       1         0.33     0.66  
 group3            1       0         1        0  

Thanks!

Upvotes: 0

Views: 1028

Answers (2)

Zero
Zero

Reputation: 76927

You could use

In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)

In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer    accept           reject
position   left     right   left     right
group
group1      1.0  0.000000    0.0  1.000000
group2      0.0  0.333333    1.0  0.666667
group3      1.0  1.000000    0.0  0.000000

You could also have dfa from pivot_table as well.

df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']

Upvotes: 0

cs95
cs95

Reputation: 402553

Add another step, groupby the 0th level along columns and divide c by the sum.

c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)

position   left            right          
offer    accept reject    accept    reject
group                                     
group1      1.0    0.0  0.000000  1.000000
group2      0.0    1.0  0.333333  0.666667
group3      1.0    0.0  1.000000  0.000000

If you're as much of a perfectionist as I am, you might want the whole numbers as integers, you can do that like this:

df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)

position   left            right          
offer    accept reject    accept    reject
group                                     
group1        1      0         0         1
group2        0      1  0.333333  0.666667
group3        1      0         1         0

Upvotes: 1

Related Questions