Peter Chen
Peter Chen

Reputation: 1484

Pandas sort column with numerical string

I have a DataFrame below:

col1

Numb10
Numb11
Numb12
Numb7
Numb8

How can I sort with number order:

col1

Numb7
Numb8
Numb10
Numb11
Numb12

I tried but got error TypeError: cannot convert the series to <class 'int'>.

df.sort_values(by = "col1", key = (lambda x: int(x[4:])))

Update with one missing in col1

Upvotes: 2

Views: 1976

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150785

Your x[4:] might not always be integers. You can verify with

# convert to numerical values, float, not integers
extracted_nums = pd.to_numeric(df['col1'].str[4:], errors='coerce')

# check for invalid values
# if not `0` means you have something that are not numerical
print(extracted_nums.isna().any())

# sort by values
df.loc[extracted_nums.sort_values().index]

Upvotes: 1

akuiper
akuiper

Reputation: 215057

key in sort_values takes the Series as parameter instead of individual element. From the docs:

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

In your case, you can use .str and astype for slicing and type convertion:

df.sort_values(by='col1', key=lambda s: s.str[4:].astype(int))
     col1
3   Numb7
4   Numb8
0  Numb10
1  Numb11
2  Numb12

Upvotes: 7

Related Questions