Reputation: 13327
I've got a dataframe that looks like the following (subset):
0 2014-01-23 17:01:09
1000 2014-01-23 17:38:28
2000 2014-01-23 18:08:28
3000 2014-01-23 18:33:10
4000 2014-01-24 14:19:28
5000 2014-01-24 14:31:04
6000 2014-01-23 18:34:53
7000 2014-01-23 17:15:36
8000 2014-01-23 19:31:45
9000 2014-01-23 19:23:06
10000 2014-01-23 17:58:15
Name: date, dtype: object
As you can see the column type of column date
is object
. So, I'd like to sort this column chronologically. Therefore in my code I call
data = data.sort(['date'])
however, the result looks like this:
0 2014-01-23 17:01:09
1000 2014-01-23 17:38:28
2000 2014-01-23 18:08:28
3000 2014-01-23 18:33:10
4000 2014-01-24 14:19:28
5000 2014-01-24 14:31:04
6000 2014-01-23 18:34:53
7000 2014-01-23 17:15:36
8000 2014-01-23 19:31:45
9000 2014-01-23 19:23:06
10000 2014-01-23 17:58:15
Name: date, dtype: object
So, nothing was sorted. But why does it not work?
UPDATE: I now found the following stuff on the internet:
import dateutil
data['date'] = data['date'].apply(dateutil.parser.parse)
print str(data.dtypes)
yields:
date datetime64[ns]
However, even now that the type is explicitly set to a datetime object I'm still not able to sort it chronologically. Now, I'm really confused.
UPDATE 2: Now I did the following in the code
data['date'] = pd.to_datetime(data['date'])
data = data.sort(['date'])
However, it still has no effect. No sorting w.r.t. date.
Upvotes: 1
Views: 5311
Reputation: 173
Thank you, this help me a lot.
Only thing I noticed is
data = data.sort(['date'])
Is now deprecated it should be:
data = data.sort_values(by='date')
Upvotes: 2
Reputation: 13327
Okay, I now found out. The index of the dataframe is also permutated according to the sorting order of the dataframe. So, the output of
print str(data['date'].ix[np.array(range(0,11))*1000])
print str(data[:20])
actually is different. Therefore I did not see any effect on the sorting, although everything was properly sorted already. However, this is extremly counter intuitive!
In order to create a corrected index order one can repair the permutation by:
data = data.sort(['date'])
data.index = range(0,len(data))
ADDITIONAL COMMENT: Actually it would be great if in the documentation of pandas there would be a remark on such a pitfall (and others). It nearly drove me crazy and only a collegue knew this by coincidence.
Upvotes: 1
Reputation: 10872
Convert your date to pandas datetime. like:
df['date'] = pd.to_datetime(df['date'])
Upvotes: 0