Reputation: 4725
Given the following data:
import pandas as pd
import io
df = pd.read_csv(
io.StringIO(
"bit,val\nbit_0,40.9\nbit_1,49.6\nbit_2,50.5\nbit_3,37.7\nbit_4,52.0\nbit_5,55.1\nbit_6,40.6\nbit_7,37.8\nbit_8,39.2\nbit_9,51.1\nbit_10,48.4\nbit_11,49.8\nbit_12,51.7\nbit_13,46.7\nbit_14,40.8\nbit_15,41.1\nbit_16,36.7\nbit_17,50.8\nbit_18,41.6\nbit_19,41.3\n"
)
)
df = df.sample(len(df), random_state=1).reset_index(drop=True)
which looks as:
bit val
0 bit_3 37.7
1 bit_16 36.7
2 bit_6 40.6
3 bit_10 48.4
4 bit_2 50.5
5 bit_14 40.8
6 bit_4 52.0
7 bit_17 50.8
8 bit_7 37.8
9 bit_1 49.6
10 bit_13 46.7
11 bit_0 40.9
12 bit_19 41.3
13 bit_18 41.6
14 bit_9 51.1
15 bit_15 41.1
16 bit_8 39.2
17 bit_12 51.7
18 bit_11 49.8
19 bit_5 55.1
I would like to sort the data by the bit
column, based on the trailing digit.
If this was a standard python list then the following would work:
sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))
I'm not sure how to apply this to a dataframe though.
Upvotes: 2
Views: 3506
Reputation: 12818
With pandas >= 1.1.0, you can use key
just like in sorted.
In my solution I sort on the bit column, but for the sorting I throw out the bit_
:
df.sort_values(
by='bit',
key=lambda x: x.str.replace('bit_', '').astype(int),
)
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
Docs on .sort_values()
:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
Upvotes: 1
Reputation: 34086
You can use str.extract
with Series.argsort
and df.loc
:
In [1038]: ix = df.bit.str.extract('(\d+)', expand=False).astype(int).argsort().tolist()
In [1039]: df.loc[ix]
Out[1039]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
Upvotes: 0
Reputation: 15488
Use df.sort_values
and .str.split("_",expand=True)
and casting to int with .astype(int)
like this:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))
Output:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
If you need to reset index, just add .reset_index(drop=True)
:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)
Output:
bit val
0 bit_0 40.9
1 bit_1 49.6
2 bit_2 50.5
3 bit_3 37.7
4 bit_4 52.0
5 bit_5 55.1
6 bit_6 40.6
7 bit_7 37.8
8 bit_8 39.2
9 bit_9 51.1
10 bit_10 48.4
11 bit_11 49.8
12 bit_12 51.7
13 bit_13 46.7
14 bit_14 40.8
15 bit_15 41.1
16 bit_16 36.7
17 bit_17 50.8
18 bit_18 41.6
19 bit_19 41.3
Upvotes: 2
Reputation: 323326
Try with natsort
from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
Upvotes: 2
Reputation: 11171
One performant method is to create a series sorted how you wish, then pass that index to the dataframe:
# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()
# pass back to dataframe
df = df.iloc[sort_series.index]
result:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
You can reset the dataframe index as desired
Upvotes: 0