Shuoqi Zhou
Shuoqi Zhou

Reputation: 37

How can I sort dataframe with str+number

I got a dataframe now and try to sort it by 'No.' dataframe is like:

         No.              value1                  value2
 0     NODE11              1654                     14
 1     NODE10              1710                    817
 2      NODE9              1162                     23
 3      NODE8              1883                    198
 4   no match              2303                      0
 5      NODE1              1098                     67
 6      NODE3              1488                     78
 7      NODE2              2426                     49
 8      NODE5              1974                     90
 9      NODE4              1767                     11
 10     NODE7              1708                     26
 11     NODE6              2418                    248

I tried:

df.to_csv('count.csv',index = False)
df = pd.read_csv('count.csv', thousands=',', parse_dates=['No.'])
df = df.sort_values(by=['No.', 'only can on this node'], ascending=[True, False])
df.to_csv('count.csv',index = False)

Its work but:

         No.              value1                 value2
 5      NODE1              1098                     67
 1     NODE10              1710                    817  -------error
 0     NODE11              1654                     14   -------error
 7      NODE2              2426                     49
 6      NODE3              1488                     78
 9      NODE4              1767                     11
 8      NODE5              1974                     90
 11     NODE6              2418                    248
 10     NODE7              1708                     26
 3      NODE8              1883                    198
 2      NODE9              1162                     23
 4   no match              2303                      0

I dont know how to sort this dataframe because this sort key is str+number. any ideas? Thanks!

Upvotes: 0

Views: 43

Answers (3)

Will
Will

Reputation: 339

When you are sorting by a string, it will sort alphabetically (i.e. 10 before 9).

I suggest you create a reference column, remove "Node", convert to a integer, sort accordingly and remove the reference column.

e.g.

#split into sortable and not
df_to_sort = df.ix[df['No.'] != "no match",:]
df_to_not_sort = df.ix[df['No.'] == "no match",:]

#create a reference column
df_to_sort.ix[:,'reference'] = df_to_sort['No.'].str.replace("NODE","")

#convert type to an integer
df_to_sort.ix[:,'reference'] = df.ix[:,'reference'].astype(int)

#sort the sortable part
df_sorted = df_to_sort.sort_values('reference')

#drop the reference column
df_sorted = df_sorted.drop(['reference'],axis=1)

#append the non sorted (i.e. no match) section to the bottom
df = df_sorted.append([df_to_not_sort])

Hopefully that works!

FYI James' answer is way more succinct, use that.

Upvotes: 0

James
James

Reputation: 36691

Your column is a string and is sorted as such.

However, you can parse the No. column to a new data frame with a column of strings and a columns of integers, sort that data frame, and use the sorted index to rearrange your original data frame.

df2 = df['No.'].str.extractall('([A-Za-z ]+)(\d+)?').reset_index(drop=True)
df2[1] = df2[1].astype(float)
df.loc[df2.sort_values([0,1]).index]

# returns:
         No.  value1  value2
5      NODE1    1098      67
7      NODE2    2426      49
6      NODE3    1488      78
9      NODE4    1767      11
8      NODE5    1974      90
11     NODE6    2418     248
10     NODE7    1708      26
3      NODE8    1883     198
2      NODE9    1162      23
1     NODE10    1710     817
0     NODE11    1654      14
4   no match    2303       0

Upvotes: 1

Robert I
Robert I

Reputation: 1527

Its sorting it alphabetically like a string You will want to rename NODE1 --> NODE01 NODE2 --> NODE02 ... NODE9 --> NODE09 NODE10 --> NODE10

Because NODE1 NODE12 NODE13 NODE1123123 NODE2 NODE3 is sorted alphabetically.

Upvotes: 0

Related Questions