cc1000ml
cc1000ml

Reputation: 67

How to sort without column name,using pandas

My data is:

import pandas
A=pandas.read_csv(r'D:\AUL_prediction\Merge_file\plasmid',sep='   ',header=None, engine='python')
print A

result is:

                 0     1                 2            3  
                 0     plasmid.gb        NC021289.1    75   
                 1     plasmid.gb        NC016815.1   763   
                 2     plasmid.gb      NZCP011480.1   102   
                 3     plasmid.gb        NC017324.1  1278   
                 4     plasmid.gb        NC007488.2    32   
                 5     plasmid.gb        NC019848.2   632   
                 6     plasmid.gb      NZCP007644.1   208   
                 7     plasmid.gb        NC007336.1    46   
                 8     plasmid.gb      NZCP012748.1   402   
                 9     plasmid.gb      NZCP011248.1   353   

I want to sort this data based on the A[3],and A[2], any one knows how to do this? I tried sort_values, however, it does not recognize column name '0' or '1'

Upvotes: 2

Views: 12023

Answers (3)

Carson
Carson

Reputation: 8088

I'm not sure why you insist on not using the header

If that is the original data are like so then that isn't problems

you can assign the title to the DataFrame, and that is more readable for programmers.

import pandas as pd
from io import StringIO

data = """
plasmid.gb,NC021289.1,75   
plasmid.gb,NC016815.1,763   
plasmid.gb,NZCP011480.1,102   
plasmid.gb,NC017324.1,1278   
plasmid.gb,NC007488.2,32   
plasmid.gb,NC019848.2,632   
plasmid.gb,NZCP007644.1,208   
plasmid.gb,NC007336.1,46   
plasmid.gb,NZCP012748.1,402   
plasmid.gb,NZCP011248.1,3
"""

df = pd.read_csv(StringIO(data), sep=',', header=None, engine='python')
print('BEFORE\n', df)
df.columns = ['file', 'event-id', 'value']
print('\nAFTER\n', df.sort_values(['value', 'event-id'], ascending=[False, True]))

output

BEFORE
             0             1     2
0  plasmid.gb    NC021289.1    75
1  plasmid.gb    NC016815.1   763
2  plasmid.gb  NZCP011480.1   102
3  plasmid.gb    NC017324.1  1278
4  plasmid.gb    NC007488.2    32
5  plasmid.gb    NC019848.2   632
6  plasmid.gb  NZCP007644.1   208
7  plasmid.gb    NC007336.1    46
8  plasmid.gb  NZCP012748.1   402
9  plasmid.gb  NZCP011248.1     3

AFTER
          file      event-id  value
3  plasmid.gb    NC017324.1   1278
1  plasmid.gb    NC016815.1    763
5  plasmid.gb    NC019848.2    632
8  plasmid.gb  NZCP012748.1    402
6  plasmid.gb  NZCP007644.1    208
2  plasmid.gb  NZCP011480.1    102
0  plasmid.gb    NC021289.1     75
7  plasmid.gb    NC007336.1     46
4  plasmid.gb    NC007488.2     32
9  plasmid.gb  NZCP011248.1      3

Upvotes: 1

Yasi Klingler
Yasi Klingler

Reputation: 636

The question is old, but i've just came across the problem !

when you do not have the column header, just give the values and avoid keyword by in df.sort_values. the solution:

df = df.sort_values(df.columns[i])

where df in your case is A and i is the index of the column.

Upvotes: 10

zipa
zipa

Reputation: 27879

First go with:

f = A.columns.values.tolist()

To see what is the actual names of your columns are. Then you can try:

A.sort_values(by=f[:2])

And if you sort by column name keep in mind that 2L is a long int, so just go:

A.sort_values(by=[2L])

Upvotes: 1

Related Questions