Reputation: 31963
Is it feasible to sort pandas dataframe by values of a column, but also by index?
If you sort a pandas dataframe by values of a column, you can get the resultant dataframe sorted by the column, but unfortunately, you see the order of your dataframe's index messy within the same value of a sorted column.
So, can I sort a dataframe by a column, such as the column named count
but also sort it by the value of index? And is it also feasible to sort a column by descending order, but whereas sort a index by ascending order?
I know how to sort multiple columns in dataframe, and also know I can achieve what I'm asking here by first reset_index()
the index and sort it, and then create the index again. But is it more intuitive and efficient way to do it?
Upvotes: 81
Views: 90992
Reputation: 26211
I landed here searching for how to sort on both index and columns. While this matches the title of this question, I see by reading it in more detail that there are additional bits (some parts ascending, other descending). I have a different reason: I would like a consistent and total ordering for a DataFrame, so I'd like to sort it by all levels of the index and all columns. (We are saving them, in partitions or whole, into a content-addressable storage, where same content => same address and with consistent ordering, we achieve good dedup).
The other answers here seem to modify the index, or assume certain "nice" conditions about the index names and the columns.
The idea in this answer is to rename temporarily the levels of the index (as some indexes come with a mix of named and unnamed levels), and rename the columns as well (as some DataFrames may have duplicate column names and other issues), then sort by all levels and all columns, and revert all the renaming.
Since it's not as fast as I would like, we can first check if the index values are unique. If so, then there is no need to also sort by columns. Putting it all together:
def full_sort(df):
if df.index.is_unique:
return df.sort_index()
tmp_level_names = [f'a{i}' for i in range(len(df.index.names))]
tmp_col_names = [f'c{i}' for i in range(len(df.columns))]
return (
df
.rename_axis(index=tmp_level_names)
.set_axis(tmp_col_names, axis=1)
.sort_values(tmp_level_names + tmp_col_names)
.rename_axis(index=df.index.names)
.set_axis(df.columns, axis=1)
)
Upvotes: 0
Reputation: 4594
Pandas 0.23 finally gets you there :-D
You can now pass index names (and not only column names) as parameters to sort_values
. So, this one-liner works:
df = df.sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])
And if your index is currently unnamed:
df.index.name = 'MyIdx'
df = df.sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])
Upvotes: 115
Reputation: 22716
In pandas 0.23+ you can do it directly - see OmerB's answer. If you don't yet have 0.23+, read on.
I'd venture that the simplest way is to just copy your index over to a column, and then sort by both.
df['colFromIndex'] = df.index
df = df.sort(['count', 'colFromIndex'])
I'd also prefer to be able to just do something like df.sort(['count', 'index'])
, but of course that doesn't work.
Upvotes: 34
Reputation: 507
As of pandas version 0.22.
You can temporarily set the column as an index, sort the index on that column and then reset. By default it will maintain the order of the existing index:
df = df.set_index('column_name', append=True).sort_index(level=1).reset_index(level=1)
I think the above could be done with 'inplace' options but I think it's easier to read as above.
Upvotes: 8
Reputation: 153460
You can use the ascending parameter in sort_index, but you must pass it as a list for it to work correctly as of pandas 0.22.0.
import pandas as pd
import numpy as np
df = pd.DataFrame({'idx_0':[2]*6+[1]*5,
'idx_1':[6,4,2,10,18,5,11,1,7,9,3],
'value_1':np.arange(11,0,-1),
'MyName':list('SORTEDFRAME')})
df = df.set_index(['idx_0','idx_1'])
df
Output:
MyName value_1
idx_0 idx_1
2 6 S 11
4 O 10
2 R 9
10 T 8
18 E 7
5 D 6
1 11 F 5
1 R 4
7 A 3
9 M 2
3 E 1
Sorting by values and index should get "FRAMESORTED" instead of "SORTEDFRAME"
df.sort_values('value_1', ascending=False)\
.sort_index(level=0, ascending=[True])
Output:
MyName value_1
idx_0 idx_1
1 11 F 5
1 R 4
7 A 3
9 M 2
3 E 1
2 6 S 11
4 O 10
2 R 9
10 T 8
18 E 7
5 D 6
Note you must pass ascending
parameter in sort_index
as a list and not as a scalar. It will not work.
Upvotes: 5
Reputation: 15432
You can use a combination of groupby and apply:
In [2]: df = pd.DataFrame({
'transID': range(8),
'Location': ['New York','Chicago','New York','New York','Atlanta','Los Angeles',
'Chicago','Atlanta'],
'Sales': np.random.randint(0,10000,8)}).set_index('transID')
In [3]: df
Out[3]:
Location Sales
transID
0 New York 1082
1 Chicago 1664
2 New York 692
3 New York 5669
4 Atlanta 7715
5 Los Angeles 987
6 Chicago 4085
7 Atlanta 2927
In [4]: df.groupby('Location').apply(lambda d: d.sort()).reset_index('Location',drop=True)
Out[4]:
Location Sales
transID
4 Atlanta 7715
7 Atlanta 2927
1 Chicago 1664
6 Chicago 4085
5 Los Angeles 987
0 New York 1082
2 New York 692
3 New York 5669
I drop 'Location' at in the last line because groupby inserts the grouped levels into the first positions in the index. Sorting and then dropping them preserves the sorted order.
Upvotes: -1
Reputation: 10139
To sort a column descending, while maintaining the index ascending:
import pandas as pd
df = pd.DataFrame(index=range(5), data={'c': [4,2,2,4,2]})
df.index = df.index[::-1]
print df.sort(column='c', ascending=False)
Output:
c
1 4
4 4
0 2
2 2
3 2
Upvotes: 1