Reputation: 3141
I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.
import pandas as pd
import datetime
year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]
>>> year_month
month year date
0 1 2001 2001-01-01
1 2 2002 2002-02-01
2 3 2003 2003-03-01
I think the unique function is doing something to the timestamps that is changing them somehow:
first_date = year_month['date'].unique()[0]
>>> first_date == year_month['date'][0]
False
In fact:
>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
'2002-01-31T16:00:00.000000000-0800',
'2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')
My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.
EDIT
I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.
Upvotes: 1
Views: 3573
Reputation: 393973
You have to convert to datetime64 to compare:
In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:
True
This is because unique
has converted the dtype to datetime64
:
In [6]:
first_date = year_month['date'].unique()[0]
first_date
Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')
I think is because unique
returns a np array and there is no dtype that numpy understands TimeStamp
currently: Converting between datetime, Timestamp and datetime64
Upvotes: 1