How to sort and select pandas data

Question

I am brand new to pandas so please excuse how basic this question is. I have a CSV file which I read with

df = pandas.read_csv("file.csv")

I would like to perform some basic functions now on the data. For example

Sort by column 11 divided by column 8.
Select only those records with a particular string contained in field 6.

How can you do that?

Some example data:

931,Oxfordshire,9314125,123255,Larkmead School,Abingdon,125,124,20,SUPP,8
931,Oxfordshire,9314126,123256,John Mason School,Abingdon,164,164,25,6,16
931,Oxfordshire,9314127,123257,Fitzharrys School,Abingdon,150,149,9,0,11

By deleting the first few rows of comments in the CSV file and then

df = pandas.read_csv("GCSEIGCSEresultsv2.csv", header=None, names=['A','B','C','D','E','F','G', 'H','I','J'])

I get

df.dtypes
Out[20]: 
A    object
B     int64
C     int64
D    object
E    object
F    object
G    object
H    object
I    object
J    object
dtype: object

I need to tell pandas that SUPP means NaN I think.

CT Zhu · Accepted Answer

Suppose I name your columns from c1 to c11:

c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11
931,Oxfordshire,9314125,123255,Larkmead School,Abingdon,125,124,20,SUPP,8
931,Oxfordshire,9314126,123256,John Mason School,Abingdon,164,164,25,6,16
931,Oxfordshire,9314127,123257,Fitzharrys School,Abingdon,150,149,9,0,11

to sort:

df['r_c8c11']=df['c11']*1.0/df['c8'] #if your dtype for these columns are int
df.sort(columns=['r_c8c11'])

to select only those records with a particular string contained in field 6:

df[df['c6']=='Abingdon']

How to sort and select pandas data

Answers (1)

Related Questions