Reputation: 1592
I have the following dataframe:
import numpy as np
col1=['A_100','A_30','A_60','B_100','B_30','B_60']
vals=np.random.randint(0,20,size=6).tolist()
df=pd.DataFrame(list(zip(col1,vals)),columns=['index','vals']).set_index('index')
>>>
vals
index
A_100 5
A_30 0
A_60 5
B_100 9
B_30 1
B_60 9
I want to sort the dataframe so it will be organized based on the letters of the string and then on th nnumerical value, like this:
val
index
A_100 5
A_60 0
A_30 5
B_100 9
B_60 1
B_30 9
I have tried use sort index:
df.sort_index()
but that returned the same dataframe.
I have also tried to do it by reset index and sort values but I get the same result:
df=df.reset_index()
df.sort_values('index')
>>>#nothin changed
df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
>>>#new dataframe but now I have it A,B,A,B and I still need the A to be together and then the B
vals
index
A_30 5
B_30 0
A_60 5
B_60 9
A_100 1
B_100 9
I assume that the reason is that when it sort the values, it looks on the first digit and then 6 is greater than 3 that greater than 1, and for that reason it believes is already "sorted" , and it doesn't read 100 the number 100.
Is there any simple way to make it sort by the string (a-b-c order) and then by the number value order? (100,60,30).
Upvotes: 1
Views: 774
Reputation: 863156
Onne idea is convert index
to Series
with Series.str.split
to DataFrame
, convert second column to integers and sorting both columns and this index is used for change order in original df.index
by DataFrame.reindex
:
df1 = df.index.to_series().str.split('_',expand=True)
df1[1] = df1[1].astype(int)
df1 = df1.sort_values([0, 1], ascending=[True, False])
print (df1)
0 1
index
A_100 A 100
A_60 A 60
A_30 A 30
B_100 B 100
B_60 B 60
B_30 B 30
df = df.reindex(df1.index)
print (df)
vals
index
A_100 0
A_60 12
A_30 13
B_100 12
B_60 6
B_30 6
Upvotes: 1