jeff_new
jeff_new

Reputation: 551

pandas multi-index divide aggregated counts

My raw data is as such

    level0  level1  level2
0   0       A       foo
1   0       A       bar
2   0       B       foo
3   0       B       foo
4   0       B       foo
5   0       B       bar
6   1       A       foo
7   1       A       bar

And then I try to group the counts as such df.groupby(['level0', 'level1', 'level2']).size()

to get this

level0  level1  level2
0       A       foo            1
                bar            1
        B       foo            3
                bar            1
1       A       foo            1
                bar            1
        B       foo            1
                bar            1
2       A       foo            1
                bar            1
        B       foo            1
                bar            1

And now I would simply like to divide the counts of level2 in a ratio of bar/foo to get something like this:

level0  level1      {bar counts/foo counts}
0       A           1.00
        B           0.33
1       A           1.00
        B           1.00
2       A           1.00
        B           1.00

Perhaps I even setup the original groupby wrong but also tried .div, .apply, etc but having a mental block. Thank you for any guidance!

Upvotes: 2

Views: 68

Answers (3)

Soudipta Dutta
Soudipta Dutta

Reputation: 2152

import pandas as pd

# Sample DataFrame
data = {
    'col0': [0, 0, 0, 0, 0, 0, 1, 1],
    'col1': ['A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
    'col2': ['X', 'Y', 'X', 'X', 'X', 'Y', 'X', 'Y']
}
df = pd.DataFrame(data)
print(df)

res = df.groupby(['col0','col1','col2']).size().unstack(fill_value = 0).eval('ratio = Y/X')\
.fillna(0).groupby(['col0','col1'])['ratio']\
.max().reset_index().rename(columns = {'ratio' : 'Y/X'})
print(res)
'''
   col0 col1       Y/X
0     0    A  1.000000
1     0    B  0.333333
2     1    A  1.000000
'''

Upvotes: 0

mozway
mozway

Reputation: 262234

unstack and eval:

tmp = df.groupby(['level0', 'level1', 'level2']).size()

out = (tmp.unstack().eval('bar/foo')
          .reset_index(name='bar/foo')
      )

Or div:

tmp = (df.groupby(['level0', 'level1', 'level2'])
         .size().unstack()
      )

out = (tmp['bar']
       .div(tmp['foo'])
       .reset_index(name='bar/foo')
      )

Output:

   level0 level1   bar/foo
0       0      A  1.000000
1       0      B  0.333333
2       1      A  1.000000

Upvotes: 1

Panda Kim
Panda Kim

Reputation: 13257

Code

# your groupby code
tmp = df.groupby(['level0', 'level1', 'level2']).size() 

# get desired output by xs and div
out = tmp.xs('bar', level=2).div(tmp.xs('foo', level=2))

out:

level0  level1
0       A         1.000000
        B         0.333333
1       A         1.000000

Upvotes: 1

Related Questions