nfmcclure
nfmcclure

Reputation: 3141

Datetime and Timestamp equality in Python and Pandas

I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.

import pandas as pd
import datetime

year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]

>>> year_month
  month  year       date
0     1  2001 2001-01-01
1     2  2002 2002-02-01
2     3  2003 2003-03-01

I think the unique function is doing something to the timestamps that is changing them somehow:

first_date = year_month['date'].unique()[0]

>>> first_date == year_month['date'][0]
False

In fact:

>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
       '2002-01-31T16:00:00.000000000-0800',
       '2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')

My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.

EDIT

I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.

Upvotes: 1

Views: 3573

Answers (1)

EdChum
EdChum

Reputation: 393973

You have to convert to datetime64 to compare:

In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:

True

This is because unique has converted the dtype to datetime64:

In [6]:    
first_date = year_month['date'].unique()[0]
first_date

Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')

I think is because unique returns a np array and there is no dtype that numpy understands TimeStamp currently: Converting between datetime, Timestamp and datetime64

Upvotes: 1

Related Questions