Reputation: 89
I have the following resulting pandas DateFrame: How can I get this to sort properly? For example have the sort so that Day 2 comes after Day 1, not Day 11. As seen in Group 2 below?
Upvotes: 1
Views: 194
Reputation: 164773
set_levels
+ sort_index
The issue is your strings are being sorted as strings rather than numerically. First convert your first index level to numeric, then sort by index:
# split by whitespace, take last split, convert to integers
new_index_values = df.index.levels[1].str.split().str[-1].astype(int)
# set 'Day' level
df.index = df.index.set_levels(new_index_values, level='Day')
# sort by index
df = df.sort_index()
print(df)
Value
Group Day
A 0 1
2 3
11 2
B 5 5
7 6
10 4
Setup
The above demonstration uses this example setup:
df = pd.DataFrame({'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
'Day': ['Day 0', 'Day 11', 'Day 2', 'Day 10', 'Day 5', 'Day 7'],
'Value': [1, 2, 3, 4, 5, 6]}).set_index(['Group', 'Day'])
print(df)
Value
Group Day
A Day 0 1
Day 11 2
Day 2 3
B Day 10 4
Day 5 5
Day 7 6
Upvotes: 3
Reputation: 88
You need to sort integers instead of strings:
import pandas as pd
x = pd.Series([1,2,3,4,6], index=[3,2,1,11,12])
x.sort_index()
1 3
2 2
3 1
11 4
12 6
dtype: int64
y = pd.Series([1,2,3,4,5], index=['3','2','1','11','12'])
y.sort_index()
1 3
11 4
12 5
2 2
3 1
dtype: int64
I would suggest to have only numbers in the column instead of strings 'Day..'.
Upvotes: 2