Pandas sort data frame by logical day

enter image description here

I have the following resulting pandas DateFrame: How can I get this to sort properly? For example have the sort so that Day 2 comes after Day 1, not Day 11. As seen in Group 2 below?

Upvotes: 1

Views: 194

Answers (2)

jpp
jpp

Reputation: 164773

set_levels + sort_index

The issue is your strings are being sorted as strings rather than numerically. First convert your first index level to numeric, then sort by index:

# split by whitespace, take last split, convert to integers
new_index_values = df.index.levels[1].str.split().str[-1].astype(int)

# set 'Day' level
df.index = df.index.set_levels(new_index_values, level='Day')

# sort by index
df = df.sort_index()

print(df)

           Value
Group Day       
A     0        1
      2        3
      11       2
B     5        5
      7        6
      10       4

Setup

The above demonstration uses this example setup:

df = pd.DataFrame({'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
                   'Day': ['Day 0', 'Day 11', 'Day 2', 'Day 10', 'Day 5', 'Day 7'],
                   'Value': [1, 2, 3, 4, 5, 6]}).set_index(['Group', 'Day'])

print(df)

              Value
Group Day          
A     Day 0       1
      Day 11      2
      Day 2       3
B     Day 10      4
      Day 5       5
      Day 7       6

Upvotes: 3

PyJan
PyJan

Reputation: 88

You need to sort integers instead of strings:

import pandas as pd
x = pd.Series([1,2,3,4,6], index=[3,2,1,11,12])
x.sort_index()

1     3
2     2
3     1
11    4
12    6
dtype: int64

y = pd.Series([1,2,3,4,5], index=['3','2','1','11','12'])
y.sort_index()

1     3
11    4
12    5
2     2
3     1
dtype: int64

I would suggest to have only numbers in the column instead of strings 'Day..'.

Upvotes: 2

Related Questions