Reputation: 315
I need to sort a dataframe by one column, which includes a combination of numbers and letters.
df = [{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "aaron",
"name": "4B"},
{"user" : "dan",
"name": "10B"}]
My code:
df1 = df.sort_values(by=['name'])
This gets me:
df1 = [{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "dan",
"name": "10B"},
{"user" : "aaron",
"name": "4B"}]
I want:
df1 = [{"user": "seth",
"name": "1"},
{"user" : "aaron",
"name": "4B"},
{"user" : "chris",
"name": "10A"},
{"user" : "dan",
"name": "10B"}]
I had a different question that was flagged as a similar question, and their code:
df.reindex(index=natsorted(df.name))
It returns a sorted dataframe, but all values have been replaced by NaNs.
df.iloc(natsorted(df.name))
It raises an error:
TypeError: unhashable type: 'list'
Upvotes: 3
Views: 2975
Reputation: 6554
You can now also do (with pandas >= 1.1.0):
import natsort
sorted_df = df1.sort_values("name", key=natsort.natsort_keygen())
Upvotes: 1
Reputation: 153550
sort_values
now has key parameter:df = pd.DataFrame([{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "aaron",
"name": "4B"},
{"user" : "dan",
"name": "10B"}])
df.sort_values('name', key=lambda x: x.str.extract('(\d+)').squeeze().astype(int))
Output:
user name
0 seth 1
2 aaron 4B
1 chris 10A
3 dan 10B
Upvotes: 1
Reputation: 4864
To slightly correct Quang's comment, this works fine
import natsort
df1.iloc[natsort.index_humansorted(df1.name)]
Upvotes: 3
Reputation: 23099
you could use a regular expression and extract the numbers + letters, sort them and assign as a categorical column.
s = df["name"].str.extract("(\d+)?(\w|)")
s[0]= s[0].astype(int)
print(s)
0 1
0 1
1 10 A
2 4 B
3 10 B
df['name'] = pd.Categorical(df['name'],s.sort_values([0,1]).astype(str).agg(''.join,axis=1))
print(df.sort_values('name')
user name
0 seth 1
2 aaron 4B
1 chris 10A
3 dan 10B
Upvotes: 0