brina
brina

Reputation: 93

how to sort pandas groups based on an aggregate

Suppose I have a dataframe df and used groupby on it. How can I sort these groups? I would like to have the group with the highest median in column B first, the one with lowest median last. I know how to get the median sorted:

df.groupby(by='A')['B'].median().sort_values(ascending=False)

But how would I actually sort the groups (not their median)?

Upvotes: 1

Views: 3959

Answers (2)

Wael Hussein
Wael Hussein

Reputation: 155

My understanding is that you are looking for the groups sorted by the medians.

If that is the case, you can use:

df.groupby(by='A')['B'].median().sort_values(ascending=False).index

Upvotes: 0

EdChum
EdChum

Reputation: 393963

You can use the index from the result of sort_values to index the group:

In [118]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df['group'] = list('aabcc')
df

Out[118]:
          a         b         c group
0  3.260645 -0.636861  0.183312     a
1 -0.409939  1.915902 -2.426166     a
2 -0.522152  1.621288  0.225874     b
3  0.576484  0.499148 -1.037899     c
4  0.626081  0.040262 -1.679669     c

In [128]:    
gp = df.groupby('group')['a'].median()
gp

Out[128]:
group
a    1.425353
b   -0.522152
c    0.601282
Name: a, dtype: float64

In [134]:
order = gp.sort_values(ascending=False)
order

Out[134]:
group
a    1.425353
c    0.601282
b   -0.522152
Name: a, dtype: float64

In [135]:    
gp[order.index]

Out[135]:
group
a    1.425353
c    0.601282
b   -0.522152
Name: a, dtype: float64

EDIT

If you want to use the sort order based on median you can add the median column back to the orig df using transform and then sort the original df:

In [7]:
df['median'] = df.groupby('group')['a'].transform('median')
df

Out[7]:
          a         b         c group    median
0  3.260645 -0.636861  0.183312     a  1.425353
1 -0.409939  1.915902 -2.426166     a  1.425353
2 -0.522152  1.621288  0.225874     b -0.522152
3  0.576484  0.499148 -1.037899     c  0.601282
4  0.626081  0.040262 -1.679669     c  0.601282

In [9]:    
df.loc[df['median'].sort_values(ascending=False).index]

Out[9]:
          a         b         c group    median
1 -0.409939  1.915902 -2.426166     a  1.425353
0  3.260645 -0.636861  0.183312     a  1.425353
4  0.626081  0.040262 -1.679669     c  0.601282
3  0.576484  0.499148 -1.037899     c  0.601282
2 -0.522152  1.621288  0.225874     b -0.522152

Upvotes: 1

Related Questions