Pere
Pere

Reputation: 746

Sorting a pandas data frame by a series

Pandas data frames can be sorted by values of its columns, but I wanted to sort a data frame by values of a series that I don't want to add to the data frame - although it has the same indexes.

I got my data frame sorted by adding the series to the data frame (as a column), sorting, and removing the column again. In the sample code nprojnpercent is my data frame and total is my series:

nprojnpercent["total"]=total
nprojnpercent.sort_values(by="total",ascending=False,inplace=True)
nprojnpercent.pop("total")

It works, but it seems quite weird to me. Is there a simpler way to order a data frame by a series?

Upvotes: 6

Views: 5816

Answers (2)

Abhishek Bhatia
Abhishek Bhatia

Reputation: 577

A slightly different (possibly robust) approach:


def sort_df(df, by, **kwargs):
    if isinstance(by, str):
        return df.sort_values(by)
    
    by = pd.Series(by.reset_index(drop=True))
    idx = by.sort_values(**kwargs).index
    
    return df.iloc[idx]

Now it doesn't matter if the indices of df (which may be datetime or anything else) and by do not match. You can still sort by the values of by.

Upvotes: 3

Alexander
Alexander

Reputation: 109626

sort_values returns the sorted series, so take the index of this and name it idx. Because the index of s corresponds to that of df, you can use loc together with idx to then rearrange the rows based on the sorted value of `s.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
s = pd.Series(np.random.randn(5), name='C')

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> s
0    0.333674
1    1.494079
2   -0.205158
3    0.313068
4   -0.854096
Name: C, dtype: float64

idx = s.sort_values().index

>>> df.loc[idx]
          A         B         C
4  0.761038  0.121675  0.443863
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278

Upvotes: 5

Related Questions