szu
szu

Reputation: 972

Float date conversion

I'm creating two same date ranges using Pandas and Matplotlib. After conversion of numpy.float64 to Pandas timestamp I have 1 minute diff - why?

import pandas as pd
import matplotlib.dates as mdates
import datetime as dt

dstart = dt.date(2013,12,5)
dend = dt.date(2013,12,10)

d1 = pd.date_range(dstart, dend, freq='H')
d2 = mdates.drange(dstart, dend, dt.timedelta(hours=1))

print d1[2]
print pd.Timestamp(mdates.num2date(d2[2]))

And get the result:

2013-12-05 02:00:00
2013-12-05 02:01:00.504201+00:00

Upvotes: 3

Views: 314

Answers (1)

joris
joris

Reputation: 139172

Note that also the length of both ranges are not the same:

>>> len(d1)
121
>>> len(d2)
120

I think this can be considered as a bug in mdates.drange, but the error is introduced because you are using dates as input and no datetimes (which is what the docstring also says it should be). At least, mdates.drange could check for this I think.
When using datetimes, it is as expected:

In [50]: dstart = dt.datetime(2013,12,5)
In [51]: dend = dt.datetime(2013,12,10)
In [52]: d1 = pd.date_range(dstart, dend, freq='H')
In [53]: d2 = mdates.drange(dstart, dend, dt.timedelta(hours=1))
In [54]: print d1[2]
2013-12-05 02:00:00

In [55]: print pd.Timestamp(mdates.num2date(d2[2]))
2013-12-05 02:00:00+00:00

Notice that the length is still different, because mdates.drange produces a half open interval (so dend not included) while pd.date_range produces a closed interval.


The technical explanation of why this fails is that the calculation in mdates.drange of the end value of the range goes wrong because of the date (https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/dates.py#L361). The end value would be in your case in hours, but by using a date, the hours are neglected, and a wrong interval is created.

Upvotes: 2

Related Questions