Reputation: 1982
My Pandas data frame contains the following data:
product,values
a1, 10
a5, 20
a10, 15
a2, 45
a3, 12
a6, 67
I have to sort this data frame based on the product column. Thus, I would like to get the following output:
product,values
a10, 15
a6, 67
a5, 20
a3, 12
a2, 45
a1, 10
Unfortunately, I'm facing the following error:
ErrorDuringImport(path, sys.exc_info())
ErrorDuringImport: problem in views - type 'exceptions.Indentation
Upvotes: 10
Views: 49956
Reputation: 1
import pandas as pd
df = pd.DataFrame({
"product": ['a1,', 'a5,', 'a10,', 'a2,','a3,','a6,'],
"value": [10, 20, 15, 45, 12, 67]
})
df
==>
product value
0 a1, 10
1 a5, 20
2 a10, 15
3 a2, 45
4 a3, 12
5 a6, 67
df.sort_values(by='product', key=lambda col: col.str[1:-1].astype(int), ascending=False)
==>
product value
2 a10, 15
5 a6, 67
1 a5, 20
4 a3, 12
3 a2, 45
0 a1, 10
Upvotes: 0
Reputation: 862406
You can first extract
digits
and cast to int
by astype
. Then sort_values
of column sort
and last drop
this column:
df['sort'] = df['product'].str.extract('(\d+)', expand=False).astype(int)
df.sort_values('sort',inplace=True, ascending=False)
df = df.drop('sort', axis=1)
print (df)
product values
2 a10 15
5 a6 67
1 a5 20
4 a3 12
3 a2 45
0 a1 10
It is necessary, because if use only sort_values
:
df.sort_values('product',inplace=True, ascending=False)
print (df)
product values
5 a6 67
1 a5 20
4 a3 12
3 a2 45
2 a10 15
0 a1 10
Another idea is use natsort
library:
from natsort import index_natsorted, order_by_index
df = df.reindex(index=order_by_index(df.index, index_natsorted(df['product'], reverse=True)))
print (df)
product values
2 a10 15
5 a6 67
1 a5 20
4 a3 12
3 a2 45
0 a1 10
Upvotes: 17