Reputation: 12943
I want to sort by name length. There doesn't appear to be a key
parameter for sort_values
so I'm not sure how to accomplish this. Here is a test df:
import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})
Upvotes: 57
Views: 41405
Reputation: 301
It's worth using the key
argument to avoid creating unnecessary columns:
df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())
Upvotes: 1
Reputation: 42946
Using DataFrame.sort_values
we can pass an anonymous (lambda) function computing string length (using .str.len()
Series method) to the key
argument:
df = pd.DataFrame({
'name': ['Steve', 'Al', 'Markus', 'Greg'],
'score': [2, 4, 2, 3]
})
print(df)
name score
0 Steve 2
1 Al 4
2 Markus 2
3 Greg 3
df.sort_values(by="name", key=lambda x: x.str.len())
name score
1 Al 4
3 Greg 3
0 Steve 2
2 Markus 2
Upvotes: 52
Reputation: 1721
A fancy and minimal solution:
df.iloc[df.agg({"name":len}).sort_values('name').index]
name score
1 Al 4
3 Greg 3
0 Steve 2
2 Markus 2
Upvotes: 3
Reputation: 305
The answer of @jezrael is great and explains well. Here is the final result :
index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)
Upvotes: 3
Reputation: 2149
I found this solution more intuitive, specially if you want to do something depending on the column length later on.
df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)
Now your dataframe will have a column with name length
with the value of string length from column name
in it and the whole dataframe will be sorted in descending order.
Upvotes: 18
Reputation: 863701
You can use reindex
of index
of Series
created by len
with sort_values
:
print (df.name.str.len())
0 5
1 2
2 6
3 4
Name: name, dtype: int64
print (df.name.str.len().sort_values())
1 2
3 4
0 5
2 6
Name: name, dtype: int64
s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')
print (df.reindex(s))
name score
1 Al 4
3 Greg 3
0 Steve 2
2 Markus 2
df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
name score
0 Al 4
1 Greg 3
2 Steve 2
3 Markus 2
Upvotes: 53