Reputation: 67
My data is:
import pandas
A=pandas.read_csv(r'D:\AUL_prediction\Merge_file\plasmid',sep=' ',header=None, engine='python')
print A
result is:
0 1 2 3
0 plasmid.gb NC021289.1 75
1 plasmid.gb NC016815.1 763
2 plasmid.gb NZCP011480.1 102
3 plasmid.gb NC017324.1 1278
4 plasmid.gb NC007488.2 32
5 plasmid.gb NC019848.2 632
6 plasmid.gb NZCP007644.1 208
7 plasmid.gb NC007336.1 46
8 plasmid.gb NZCP012748.1 402
9 plasmid.gb NZCP011248.1 353
I want to sort this data based on the A[3],and A[2], any one knows how to do this? I tried sort_values, however, it does not recognize column name '0' or '1'
Upvotes: 2
Views: 12023
Reputation: 8088
I'm not sure why you insist on not using the header
If that is the original data are like so then that isn't problems
you can assign the title to the DataFrame, and that is more readable for programmers.
import pandas as pd
from io import StringIO
data = """
plasmid.gb,NC021289.1,75
plasmid.gb,NC016815.1,763
plasmid.gb,NZCP011480.1,102
plasmid.gb,NC017324.1,1278
plasmid.gb,NC007488.2,32
plasmid.gb,NC019848.2,632
plasmid.gb,NZCP007644.1,208
plasmid.gb,NC007336.1,46
plasmid.gb,NZCP012748.1,402
plasmid.gb,NZCP011248.1,3
"""
df = pd.read_csv(StringIO(data), sep=',', header=None, engine='python')
print('BEFORE\n', df)
df.columns = ['file', 'event-id', 'value']
print('\nAFTER\n', df.sort_values(['value', 'event-id'], ascending=[False, True]))
output
BEFORE
0 1 2
0 plasmid.gb NC021289.1 75
1 plasmid.gb NC016815.1 763
2 plasmid.gb NZCP011480.1 102
3 plasmid.gb NC017324.1 1278
4 plasmid.gb NC007488.2 32
5 plasmid.gb NC019848.2 632
6 plasmid.gb NZCP007644.1 208
7 plasmid.gb NC007336.1 46
8 plasmid.gb NZCP012748.1 402
9 plasmid.gb NZCP011248.1 3
AFTER
file event-id value
3 plasmid.gb NC017324.1 1278
1 plasmid.gb NC016815.1 763
5 plasmid.gb NC019848.2 632
8 plasmid.gb NZCP012748.1 402
6 plasmid.gb NZCP007644.1 208
2 plasmid.gb NZCP011480.1 102
0 plasmid.gb NC021289.1 75
7 plasmid.gb NC007336.1 46
4 plasmid.gb NC007488.2 32
9 plasmid.gb NZCP011248.1 3
Upvotes: 1
Reputation: 636
The question is old, but i've just came across the problem !
when you do not have the column header, just give the values and avoid keyword by
in df.sort_values
. the solution:
df = df.sort_values(df.columns[i])
where df in your case is A
and i
is the index of the column.
Upvotes: 10
Reputation: 27879
First go with:
f = A.columns.values.tolist()
To see what is the actual names of your columns are. Then you can try:
A.sort_values(by=f[:2])
And if you sort by column name keep in mind that 2L is a long int, so just go:
A.sort_values(by=[2L])
Upvotes: 1