Reputation: 12943

Sort dataframe by string length

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})

Upvotes: 57

Answers (6)

matt91t

Reputation: 301

It's worth using the key argument to avoid creating unnecessary columns:

df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())

Upvotes: 1

Erfan

Reputation: 42946

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3

df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

Upvotes: 52

Billy Bonaros

Reputation: 1721

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

Upvotes: 3

Thierry G.

Reputation: 305

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)

Upvotes: 3

moshfiqur

Reputation: 2149

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

Upvotes: 18

jezrael

Reputation: 863701

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2

Upvotes: 53

Sort dataframe by string length

Answers (6)

Related Questions