Rnj
Rnj

Reputation: 1187

Sorting data frame based on column value

I have column col1 in the data frame with following values:

col1   col2  col3
9.1
9.1
9.11
9.12
9.13
9.14
9.15
9.16
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
10.1
10.1
10.2
10.3

Is it possilbe to sort data frame based on col1 values as follows:

col1   col2  col3
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
10.1
10.1
10.2
10.3

There are two things here:

  1. 9.10 is interpreted as 9.1 which I want avoid.
  2. I want 9.10 appear after 9.9 in sort order.

Here is example code:

>>> import pandas as pd
>>> pd.DataFrame([9.1,9.7,9.8,9.9,9.10,10.0,10.1,10.2,10.11])
       0
0   9.10
1   9.70
2   9.80
3   9.90
4   9.10
5  10.00
6  10.10
7  10.20
8  10.11

>>> df.sort_values(0)
       0
0   9.10
4   9.10
1   9.70
2   9.80
3   9.90
5  10.00
6  10.10
8  10.11
7  10.20

I wanted it to be:

       0
0    9.1
1    9.7
2    9.8
3    9.9
4   9.10
5   10.0
6   10.1
7   10.2
8  10.11

I am ok if the it shows two digits after decimal point like 9.70, but the order should be same.

PS: I didnt specify any column type as I am ok with any. My goal is to achieve two points specified above. This column values are actually directory names which I am loading in data frame and trying to sort in the order I specified above.

Upvotes: 1

Views: 104

Answers (3)

Pablo C
Pablo C

Reputation: 4761

You must create the dataframe with str data (I shuffled it randomly):

data = ['9.1', '10.1', '10.2', '10.11', '9.8', '10.0', '9.10', '9.7', '9.9']
df = pd.DataFrame(data, columns = ['col1'])
#    col1
#0    9.1
#1   10.1
#2   10.2
#3  10.11
#4    9.8
#5   10.0
#6   9.10
#7    9.7
#8    9.9

Now, you can split the column:

new = df['col1'].str.split('.', expand = True)
#    0   1
#0   9   1
#1  10   1
#2  10   2
#3  10  11
#4   9   8
#5  10   0
#6   9  10
#7   9   7
#8   9   9

Add new columns to df and sort it following them. Remember 'new' contains 'str' instances, so you can cast them to int so you can compare values (in order to sort the dataframe):

df['num0'] = new[0].astype(int)
df['num1'] = new[1].astype(int)

df = df.sort_values(['num0','num1'])
#    col1  num0  num1
#0    9.1     9     1
#7    9.7     9     7
#4    9.8     9     8
#8    9.9     9     9
#6   9.10     9    10
#5   10.0    10     0
#1   10.1    10     1
#2   10.2    10     2
#3  10.11    10    11

Optional

If you don't want to keep the columns num0 and num1, change the last code line for:

df = df.sort_values(['num0','num1'])['col1']

You can also reset the dataframe index with:

df = df.reset_index(drop=True)

Upvotes: 2

dimay
dimay

Reputation: 2804

Try this:

data = [9.1, 9.1, 9.11, 9.12, 9.13, 9.14, 9.15, 9.16,9.2,9.3,9.4,9.5,9.6,9.7,9.8,9.9,10.1,10.1,10.2,10.3,]
df = pd.DataFrame([[i,"",""] for i in data], columns=["col1", "col2", "col3"]).astype("str")

df.sort_values(by=['col1'], key=lambda x: [(int(i[0]), int(i[-1])) for i in x.str.split(".")], )

Upvotes: 0

Suhas Mucherla
Suhas Mucherla

Reputation: 1413

df.col1 = df.col1.astype(float)
df = df.sort_values(by='col1')

Upvotes: 0

Related Questions