Sorting a pandas data frame by a series

Question

Pandas data frames can be sorted by values of its columns, but I wanted to sort a data frame by values of a series that I don't want to add to the data frame - although it has the same indexes.

I got my data frame sorted by adding the series to the data frame (as a column), sorting, and removing the column again. In the sample code nprojnpercent is my data frame and total is my series:

nprojnpercent["total"]=total
nprojnpercent.sort_values(by="total",ascending=False,inplace=True)
nprojnpercent.pop("total")

It works, but it seems quite weird to me. Is there a simpler way to order a data frame by a series?

Alexander · Accepted Answer

sort_values returns the sorted series, so take the index of this and name it idx. Because the index of s corresponds to that of df, you can use loc together with idx to then rearrange the rows based on the sorted value of `s.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
s = pd.Series(np.random.randn(5), name='C')

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> s
0    0.333674
1    1.494079
2   -0.205158
3    0.313068
4   -0.854096
Name: C, dtype: float64

idx = s.sort_values().index

>>> df.loc[idx]
          A         B         C
4  0.761038  0.121675  0.443863
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278

Sorting a pandas data frame by a series

Answers (2)

Related Questions