bhjghjh
bhjghjh

Reputation: 917

Sorting data in an ascending order in pandas

I have a very long dataset which I wanted to sort in an ascending order. I am getting a little confused looking at the output because the indexes does not run from 0 to n, after sorting. my code looks like this:

import pandas
import numpy

def imputation(filename):


    ref = pandas.read_csv(filename, sep= ',', names = ['data'])

    sort_values = ref.sort_values(['data'], ascending=True)
    q =sort_values.head(10)
    return q

print imputation("file_location")

the output looks the following:

                                      data
0     0.000000e+0 3.736717e-1 -8.896232e-2
1000  1.000000e-1 3.870175e-1 -8.870570e-2
100   1.000000e-2 3.749366e-1 -8.894183e-2
10    1.000000e-3 3.737975e-1 -8.896031e-2
1     1.000000e-4 3.736843e-1 -8.896212e-2
1001  1.001000e-1 3.870317e-1 -8.870538e-2
1002  1.002000e-1 3.870459e-1 -8.870506e-2
1003  1.003000e-1 3.870601e-1 -8.870474e-2
1004  1.004000e-1 3.870742e-1 -8.870442e-2
1005  1.005000e-1 3.870884e-1 -8.870410e-2

I don't know if I am doing something wrong in the code, but should not I expect the index going from 0 to n in an ascending order too? Also my data is few thousand rows, so it got split into 3 separate columns apparently. so in this output, do I start reading data from the rightmost column? your explanation is most appreciated.

Upvotes: 0

Views: 1179

Answers (1)

Wonjin
Wonjin

Reputation: 432

First, it is better to read csv with proper seperator (looks like tab \t) , and then sort by index.

yet, if you want to proceed from your (10000,1) shaped dataframe, this may works.

# extract your index as an integer from "data" column
# if not tab, remove '\t'
ref['index'] = ref['data'].apply(lambda x: int(x.split('\t')[0]))
ref.sort_values(by='index')

Upvotes: 1

Related Questions