sql_knievel
sql_knievel

Reputation: 1401

pandas index - sort string index by numeric substring

Working with Pandas in Python 3.8.

Given an Index of string values that looks like this:

import pandas as pd

foo = pd.Index(['score_1', 'score_10', 'score_11', 'score_12', 'score_13', 'score_14',
       'score_15', 'score_16', 'score_17', 'score_18', 'score_19', 'score_2',
       'score_20', 'score_21', 'score_22', 'score_23', 'score_24', 'score_25',
       'score_26', 'score_27', 'score_3', 'score_4', 'score_5', 'score_6',
       'score_7', 'score_8', 'score_9'],
      dtype='object', name='score_field')

What's the "right" way to get it sorted so that the values are in numerical order, ex: 'score_1', 'score_2' ... 'score_9', 'score_10', etc... ?

This doesn't work:

foo.sort_values(key=lambda x: int(x.split('_')[1]))
AttributeError: 'Index' object has no attribute 'split'

And this doesn't work:

foo.sort_values(key=lambda val: val.str.split('_').str[1].astype(int))
AttributeError: Can only use .str accessor with string values!

This does work, but feels ugly:

foo = pd.Index(sorted(foo.to_list(), key=lambda x: int(x.split('_')[1])),
      dtype=foo.dtype, name=foo.name)

Upvotes: 1

Views: 284

Answers (1)

anky
anky

Reputation: 75100

Honestly, what you have makes sense to me, however, if you want to use pure pandas way, use Index.str.split and argsort:

foo[foo.str.split('_').str[1].astype(int).argsort()]

Index(['score_1', 'score_2', 'score_3', 'score_4', 'score_5', 'score_6',
   'score_7', 'score_8', 'score_9', 'score_10', 'score_11', 'score_12',
   'score_13', 'score_14', 'score_15', 'score_16', 'score_17', 'score_18',
   'score_19', 'score_20', 'score_21', 'score_22', 'score_23', 'score_24',
   'score_25', 'score_26', 'score_27'],
  dtype='object', name='score_field')

Or if you are okay for a 3rd party lib:

import natsort as ns
pd.Index(ns.natsorted(foo),name=foo.name)

Upvotes: 1

Related Questions