Reputation: 37
I got a dataframe now and try to sort it by 'No.' dataframe is like:
No. value1 value2
0 NODE11 1654 14
1 NODE10 1710 817
2 NODE9 1162 23
3 NODE8 1883 198
4 no match 2303 0
5 NODE1 1098 67
6 NODE3 1488 78
7 NODE2 2426 49
8 NODE5 1974 90
9 NODE4 1767 11
10 NODE7 1708 26
11 NODE6 2418 248
I tried:
df.to_csv('count.csv',index = False)
df = pd.read_csv('count.csv', thousands=',', parse_dates=['No.'])
df = df.sort_values(by=['No.', 'only can on this node'], ascending=[True, False])
df.to_csv('count.csv',index = False)
Its work but:
No. value1 value2
5 NODE1 1098 67
1 NODE10 1710 817 -------error
0 NODE11 1654 14 -------error
7 NODE2 2426 49
6 NODE3 1488 78
9 NODE4 1767 11
8 NODE5 1974 90
11 NODE6 2418 248
10 NODE7 1708 26
3 NODE8 1883 198
2 NODE9 1162 23
4 no match 2303 0
I dont know how to sort this dataframe because this sort key is str+number. any ideas? Thanks!
Upvotes: 0
Views: 43
Reputation: 339
When you are sorting by a string, it will sort alphabetically (i.e. 10 before 9).
I suggest you create a reference column, remove "Node", convert to a integer, sort accordingly and remove the reference column.
e.g.
#split into sortable and not
df_to_sort = df.ix[df['No.'] != "no match",:]
df_to_not_sort = df.ix[df['No.'] == "no match",:]
#create a reference column
df_to_sort.ix[:,'reference'] = df_to_sort['No.'].str.replace("NODE","")
#convert type to an integer
df_to_sort.ix[:,'reference'] = df.ix[:,'reference'].astype(int)
#sort the sortable part
df_sorted = df_to_sort.sort_values('reference')
#drop the reference column
df_sorted = df_sorted.drop(['reference'],axis=1)
#append the non sorted (i.e. no match) section to the bottom
df = df_sorted.append([df_to_not_sort])
Hopefully that works!
FYI James' answer is way more succinct, use that.
Upvotes: 0
Reputation: 36691
Your column is a string and is sorted as such.
However, you can parse the No.
column to a new data frame with a column of strings and a columns of integers, sort that data frame, and use the sorted index to rearrange your original data frame.
df2 = df['No.'].str.extractall('([A-Za-z ]+)(\d+)?').reset_index(drop=True)
df2[1] = df2[1].astype(float)
df.loc[df2.sort_values([0,1]).index]
# returns:
No. value1 value2
5 NODE1 1098 67
7 NODE2 2426 49
6 NODE3 1488 78
9 NODE4 1767 11
8 NODE5 1974 90
11 NODE6 2418 248
10 NODE7 1708 26
3 NODE8 1883 198
2 NODE9 1162 23
1 NODE10 1710 817
0 NODE11 1654 14
4 no match 2303 0
Upvotes: 1
Reputation: 1527
Its sorting it alphabetically like a string You will want to rename NODE1 --> NODE01 NODE2 --> NODE02 ... NODE9 --> NODE09 NODE10 --> NODE10
Because NODE1 NODE12 NODE13 NODE1123123 NODE2 NODE3 is sorted alphabetically.
Upvotes: 0