Sort value and calculate difference for multiindex pandas dataframe

Question

I have a pandas dataframe like this, where place1 and place2 are the indexes, I want to sort index place2 by data value, without disrupt the order of index place1, that is, the place2 column becomes a, c, b for each place1. Is there a way to do this?

place1 place2 data
 sss     a    100
         b    300
         c    200
 ses     a    400
         b    600
         c    500
  ...   ...   ...

I also want to calculate the data difference between place2 == b and place2 == a for each place1, that is, the dataframe becomes

place1 place2 data diff
 sss     a    100  200
         b    300
         c    200
 ses     a    400  200
         b    600
         c    500
  ...   ...   ...  ...

Is there an efficient way to calculate the difference like this? Thanks in advance!

user15398259 · Accepted Answer

You can .groupby() and then calculate the diff of each group:

groups = df.groupby(level=0, group_keys=False, sort=False)
df['diff'] = groups.transform(lambda x: x.xs('b', level='place2').iloc[0] - x.xs('a', level='place2').iloc[0])

It adds the value to each row of the group - but you can remove the "duplicates" afterwards if needed.

>>> df
               data  diff
place1 place2            
sss    a        100   200
       b        300   200
       c        200   200
ses    a        400   200
       b        600   200
       c        500   200

You can then sort the data column within each group

>>> groups.apply(lambda x: x.sort_values('data'))
               data  diff
place1 place2            
sss    a        100   200
       c        200   200
       b        300   200
ses    a        400   200
       c        500   200
       b        600   200

Sort value and calculate difference for multiindex pandas dataframe

Answers (1)

Related Questions