Reputation: 139
I have a data frame in pandas and I want to sort it by column.
If I use .sort_values()
like in the below code:
df.sort_values(by='id')
I get the output in the 'id' column as:
1075_2016-06-01_0_1
1075_2016-06-01_10_1
1075_2016-06-01_10_2
1075_2016-06-01_11_1
1075_2016-06-01_11_2
1075_2016-06-01_1_1
1075_2016-06-01_1_2
I expected:
1075_2016-06-01_0_1
1075_2016-06-01_1_1
1075_2016-06-01_1_2
1075_2016-06-01_10_1
1075_2016-06-01_10_2
1075_2016-06-01_11_1
1075_2016-06-01_11_2
What is the best way to do this in pandas?
Upvotes: 2
Views: 423
Reputation: 862406
One possible solution with natsort for get indices of sorting values and change of original DataFrame by loc
:
from natsort import index_natsorted, order_by_index
df2 = df.loc[order_by_index(df.index, index_natsorted(df['id']))]
Or split all values by _
, then convert columns to integer and optionaly to datetimes, sorting for indices and last use loc
with original DataFrame
:
df1 = df['id'].str.split('_', expand=True)
df1[[0,2,3]] = df1[[0,2,3]].astype(int)
df1[1] = pd.to_datetime(df1[1])
df2 = df.loc[df1.sort_values([0,1,2,3]).index]
print (df2)
id
0 1075_2016-06-01_0_1
5 1075_2016-06-01_1_1
6 1075_2016-06-01_1_2
1 1075_2016-06-01_10_1
2 1075_2016-06-01_10_2
3 1075_2016-06-01_11_1
4 1075_2016-06-01_11_2
Next solution with argsort
for sorting and change order by indexing by positions with iloc
for working with any index values:
f = lambda x: [int(x[0]), pd.to_datetime(x[1]), int(x[2]), int(x[3])]
df2 = df.iloc[df['id'].str.split('_').map(f).argsort()]
print (df2)
id
0 1075_2016-06-01_0_1
5 1075_2016-06-01_1_1
6 1075_2016-06-01_1_2
1 1075_2016-06-01_10_1
2 1075_2016-06-01_10_2
3 1075_2016-06-01_11_1
4 1075_2016-06-01_11_2
Upvotes: 1
Reputation: 139
Guys thank you very much! The combination of the two solutions worked:
df['sort_val']= df['id'].str.split('_')
f = lambda x: [int(x[0]), pd.to_datetime(x[1]), int(x[2]), int(x[3])]
df['sort_val'] = df['sort_val'].map(f)
df.sort_values(by='sort_val').drop('sort_val',1)
Upvotes: 1
Reputation: 476557
You can first split the values per underscores, and then sort these, like:
df['sort_val'] = df['id'].str.split('_')
df = df.sort_values('sort_val').drop('sort_val', 1)
The above should work on an arbitrary number of underscore separated values.
This gives us:
>>> df
id
0 1075_2016-06-01_0_1
1 1075_2016-06-01_10_1
2 1075_2016-06-01_10_2
3 1075_2016-06-01_11_1
4 1075_2016-06-01_11_2
5 1075_2016-06-01_1_1
6 1075_2016-06-01_1_2
>>> df['sort_val'] = df['id'].str.split('_')
>>> df = df.sort_values('sort_val').drop('sort_val', 1)
>>> df
id
0 1075_2016-06-01_0_1
5 1075_2016-06-01_1_1
6 1075_2016-06-01_1_2
1 1075_2016-06-01_10_1
2 1075_2016-06-01_10_2
3 1075_2016-06-01_11_1
4 1075_2016-06-01_11_2
Upvotes: 0