Reputation: 1401
Working with Pandas in Python 3.8.
Given an Index of string values that looks like this:
import pandas as pd
foo = pd.Index(['score_1', 'score_10', 'score_11', 'score_12', 'score_13', 'score_14',
'score_15', 'score_16', 'score_17', 'score_18', 'score_19', 'score_2',
'score_20', 'score_21', 'score_22', 'score_23', 'score_24', 'score_25',
'score_26', 'score_27', 'score_3', 'score_4', 'score_5', 'score_6',
'score_7', 'score_8', 'score_9'],
dtype='object', name='score_field')
What's the "right" way to get it sorted so that the values are in numerical order, ex: 'score_1', 'score_2' ... 'score_9', 'score_10',
etc... ?
This doesn't work:
foo.sort_values(key=lambda x: int(x.split('_')[1]))
AttributeError: 'Index' object has no attribute 'split'
And this doesn't work:
foo.sort_values(key=lambda val: val.str.split('_').str[1].astype(int))
AttributeError: Can only use .str accessor with string values!
This does work, but feels ugly:
foo = pd.Index(sorted(foo.to_list(), key=lambda x: int(x.split('_')[1])),
dtype=foo.dtype, name=foo.name)
Upvotes: 1
Views: 284
Reputation: 75100
Honestly, what you have makes sense to me, however, if you want to use pure pandas way, use Index.str.split
and argsort
:
foo[foo.str.split('_').str[1].astype(int).argsort()]
Index(['score_1', 'score_2', 'score_3', 'score_4', 'score_5', 'score_6',
'score_7', 'score_8', 'score_9', 'score_10', 'score_11', 'score_12',
'score_13', 'score_14', 'score_15', 'score_16', 'score_17', 'score_18',
'score_19', 'score_20', 'score_21', 'score_22', 'score_23', 'score_24',
'score_25', 'score_26', 'score_27'],
dtype='object', name='score_field')
Or if you are okay for a 3rd party lib:
import natsort as ns
pd.Index(ns.natsorted(foo),name=foo.name)
Upvotes: 1