toom
toom

Reputation: 13327

Sorting python pandas dataframe with respect to a column of type object

I've got a dataframe that looks like the following (subset):

0        2014-01-23 17:01:09
1000     2014-01-23 17:38:28
2000     2014-01-23 18:08:28
3000     2014-01-23 18:33:10
4000     2014-01-24 14:19:28
5000     2014-01-24 14:31:04
6000     2014-01-23 18:34:53
7000     2014-01-23 17:15:36
8000     2014-01-23 19:31:45
9000     2014-01-23 19:23:06
10000    2014-01-23 17:58:15
Name: date, dtype: object

As you can see the column type of column date is object. So, I'd like to sort this column chronologically. Therefore in my code I call

data = data.sort(['date'])

however, the result looks like this:

0        2014-01-23 17:01:09
1000     2014-01-23 17:38:28
2000     2014-01-23 18:08:28
3000     2014-01-23 18:33:10
4000     2014-01-24 14:19:28
5000     2014-01-24 14:31:04
6000     2014-01-23 18:34:53
7000     2014-01-23 17:15:36
8000     2014-01-23 19:31:45
9000     2014-01-23 19:23:06
10000    2014-01-23 17:58:15
Name: date, dtype: object

So, nothing was sorted. But why does it not work?

UPDATE: I now found the following stuff on the internet:

import dateutil
data['date'] = data['date'].apply(dateutil.parser.parse)
print str(data.dtypes)

yields:

date        datetime64[ns]

However, even now that the type is explicitly set to a datetime object I'm still not able to sort it chronologically. Now, I'm really confused.

UPDATE 2: Now I did the following in the code

    data['date'] = pd.to_datetime(data['date'])
    data = data.sort(['date'])

However, it still has no effect. No sorting w.r.t. date.

Upvotes: 1

Views: 5311

Answers (3)

eafloresf
eafloresf

Reputation: 173

Thank you, this help me a lot.

Only thing I noticed is

data = data.sort(['date'])

Is now deprecated it should be:

data = data.sort_values(by='date')

Upvotes: 2

toom
toom

Reputation: 13327

Okay, I now found out. The index of the dataframe is also permutated according to the sorting order of the dataframe. So, the output of

print str(data['date'].ix[np.array(range(0,11))*1000])
print str(data[:20])

actually is different. Therefore I did not see any effect on the sorting, although everything was properly sorted already. However, this is extremly counter intuitive!

In order to create a corrected index order one can repair the permutation by:

data = data.sort(['date'])
data.index = range(0,len(data))

ADDITIONAL COMMENT: Actually it would be great if in the documentation of pandas there would be a remark on such a pitfall (and others). It nearly drove me crazy and only a collegue knew this by coincidence.

Upvotes: 1

dartdog
dartdog

Reputation: 10872

Convert your date to pandas datetime. like:

df['date'] = pd.to_datetime(df['date'])

Upvotes: 0

Related Questions