Reputation: 746
Pandas data frames can be sorted by values of its columns, but I wanted to sort a data frame by values of a series that I don't want to add to the data frame - although it has the same indexes.
I got my data frame sorted by adding the series to the data frame (as a column), sorting, and removing the column again. In the sample code nprojnpercent
is my data frame and total
is my series:
nprojnpercent["total"]=total
nprojnpercent.sort_values(by="total",ascending=False,inplace=True)
nprojnpercent.pop("total")
It works, but it seems quite weird to me. Is there a simpler way to order a data frame by a series?
Upvotes: 6
Views: 5816
Reputation: 577
A slightly different (possibly robust) approach:
def sort_df(df, by, **kwargs):
if isinstance(by, str):
return df.sort_values(by)
by = pd.Series(by.reset_index(drop=True))
idx = by.sort_values(**kwargs).index
return df.iloc[idx]
Now it doesn't matter if the indices of df (which may be datetime or anything else) and by do not match. You can still sort by the values of by.
Upvotes: 3
Reputation: 109626
sort_values
returns the sorted series, so take the index of this and name it idx
. Because the index of s
corresponds to that of df
, you can use loc
together with idx
to then rearrange the rows based on the sorted value of `s.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
s = pd.Series(np.random.randn(5), name='C')
>>> df
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
>>> s
0 0.333674
1 1.494079
2 -0.205158
3 0.313068
4 -0.854096
Name: C, dtype: float64
idx = s.sort_values().index
>>> df.loc[idx]
A B C
4 0.761038 0.121675 0.443863
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
Upvotes: 5