Reut
Reut

Reputation: 1592

Sort pandas dataframe on index that is string+digits

I have the following dataframe:

import numpy as np
col1=['A_100','A_30','A_60','B_100','B_30','B_60']
vals=np.random.randint(0,20,size=6).tolist()
df=pd.DataFrame(list(zip(col1,vals)),columns=['index','vals']).set_index('index')

>>>

       vals
index   
A_100   5
A_30    0
A_60    5
B_100   9
B_30    1
B_60    9

I want to sort the dataframe so it will be organized based on the letters of the string and then on th nnumerical value, like this:

          val
index
A_100     5
A_60      0
A_30      5
B_100     9
B_60      1
B_30      9

I have tried use sort index:

df.sort_index()

but that returned the same dataframe.
I have also tried to do it by reset index and sort values but I get the same result:

df=df.reset_index()
df.sort_values('index')
>>>#nothin changed

df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
>>>#new dataframe but now I have it A,B,A,B and I still need the A to be together and then the B
      vals
index   
A_30    5
B_30    0
A_60    5
B_60    9
A_100   1
B_100   9

I assume that the reason is that when it sort the values, it looks on the first digit and then 6 is greater than 3 that greater than 1, and for that reason it believes is already "sorted" , and it doesn't read 100 the number 100.

Is there any simple way to make it sort by the string (a-b-c order) and then by the number value order? (100,60,30).

Upvotes: 1

Views: 774

Answers (1)

jezrael
jezrael

Reputation: 863156

Onne idea is convert index to Series with Series.str.split to DataFrame, convert second column to integers and sorting both columns and this index is used for change order in original df.index by DataFrame.reindex:

df1 = df.index.to_series().str.split('_',expand=True)
df1[1] = df1[1].astype(int)
df1 = df1.sort_values([0, 1], ascending=[True, False])
print (df1)
       0    1
index        
A_100  A  100
A_60   A   60
A_30   A   30
B_100  B  100
B_60   B   60
B_30   B   30

df = df.reindex(df1.index)
print (df)
       vals
index      
A_100     0
A_60     12
A_30     13
B_100    12
B_60      6
B_30      6

Upvotes: 1

Related Questions