Reputation: 1187
I have column col1
in the data frame with following values:
col1 col2 col3
9.1
9.1
9.11
9.12
9.13
9.14
9.15
9.16
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
10.1
10.1
10.2
10.3
Is it possilbe to sort data frame based on col1
values as follows:
col1 col2 col3
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
10.1
10.1
10.2
10.3
There are two things here:
9.10
is interpreted as 9.1
which I want avoid.9.10
appear after 9.9
in sort order.Here is example code:
>>> import pandas as pd
>>> pd.DataFrame([9.1,9.7,9.8,9.9,9.10,10.0,10.1,10.2,10.11])
0
0 9.10
1 9.70
2 9.80
3 9.90
4 9.10
5 10.00
6 10.10
7 10.20
8 10.11
>>> df.sort_values(0)
0
0 9.10
4 9.10
1 9.70
2 9.80
3 9.90
5 10.00
6 10.10
8 10.11
7 10.20
I wanted it to be:
0
0 9.1
1 9.7
2 9.8
3 9.9
4 9.10
5 10.0
6 10.1
7 10.2
8 10.11
I am ok if the it shows two digits after decimal point like 9.70
, but the order should be same.
PS: I didnt specify any column type as I am ok with any. My goal is to achieve two points specified above. This column values are actually directory names which I am loading in data frame and trying to sort in the order I specified above.
Upvotes: 1
Views: 104
Reputation: 4761
You must create the dataframe with str
data (I shuffled it randomly):
data = ['9.1', '10.1', '10.2', '10.11', '9.8', '10.0', '9.10', '9.7', '9.9']
df = pd.DataFrame(data, columns = ['col1'])
# col1
#0 9.1
#1 10.1
#2 10.2
#3 10.11
#4 9.8
#5 10.0
#6 9.10
#7 9.7
#8 9.9
Now, you can split the column:
new = df['col1'].str.split('.', expand = True)
# 0 1
#0 9 1
#1 10 1
#2 10 2
#3 10 11
#4 9 8
#5 10 0
#6 9 10
#7 9 7
#8 9 9
Add new
columns to df
and sort it following them. Remember 'new' contains 'str' instances, so you can cast them to int
so you can compare values (in order to sort the dataframe):
df['num0'] = new[0].astype(int)
df['num1'] = new[1].astype(int)
df = df.sort_values(['num0','num1'])
# col1 num0 num1
#0 9.1 9 1
#7 9.7 9 7
#4 9.8 9 8
#8 9.9 9 9
#6 9.10 9 10
#5 10.0 10 0
#1 10.1 10 1
#2 10.2 10 2
#3 10.11 10 11
Optional
If you don't want to keep the columns num0
and num1
, change the last code line for:
df = df.sort_values(['num0','num1'])['col1']
You can also reset the dataframe index with:
df = df.reset_index(drop=True)
Upvotes: 2
Reputation: 2804
Try this:
data = [9.1, 9.1, 9.11, 9.12, 9.13, 9.14, 9.15, 9.16,9.2,9.3,9.4,9.5,9.6,9.7,9.8,9.9,10.1,10.1,10.2,10.3,]
df = pd.DataFrame([[i,"",""] for i in data], columns=["col1", "col2", "col3"]).astype("str")
df.sort_values(by=['col1'], key=lambda x: [(int(i[0]), int(i[-1])) for i in x.str.split(".")], )
Upvotes: 0
Reputation: 1413
df.col1 = df.col1.astype(float)
df = df.sort_values(by='col1')
Upvotes: 0