Reputation: 1028
I encountered a strange, very unexpected behavior in the round-method of pandas.DatetimeIndex:
import pandas as pd
import datetime as dt
t1 = pd.DatetimeIndex([dt.datetime(2013,12,5,1,30,0),
dt.datetime(2013,12,5,2,30,0),
dt.datetime(2013,12,5,3,30,0),
dt.datetime(2013,12,5,4,30,0)])
print(t1)
gives:
DatetimeIndex(['2013-12-05 01:30:00', '2013-12-05 02:30:00',
'2013-12-05 03:30:00', '2013-12-05 04:30:00'],
dtype='datetime64[ns]', freq=None)
So far, so good. Now I want to round to the nearest full hour. I don't mind if the next or the previous hour is chosen. But I need consistent behavior.
t2 = t1.round('H')
print(t2)
Surprisingly I get:
DatetimeIndex(['2013-12-05 02:00:00', '2013-12-05 02:00:00',
'2013-12-05 04:00:00', '2013-12-05 04:00:00'],
dtype='datetime64[ns]', freq=None)
Entries 1 and 3 got rounded up while entries 2 and 4 got rounded down. Is this supposed behavior? I guess there is some numerical stuff going on under the hood. But this is really disturbing. In my case the temporal resolution is constrained to minutes. So I can add (or subtract) 1s to every time and get the desired result. But this can't be the right way to do it.
Upvotes: 1
Views: 161
Reputation: 59579
Many people learn the "round half up" rule such that 1.5 is rounded to 2, 2.5 is rounded to 3, etc. This is not how rounding is handled in numpy. From numpy.around
, emphasis my own.
For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.
Thinking about your times as hour fractions, this would be the expected behavior:
np.around([1.5, 2.5, 3.5, 4.5])
#array([2., 2., 4., 4.])
(pandas defines the same behaviour, using RoundTo.NEAREST_HALF_EVEN
for rounding)
So how do you Round Half up for a Datetime with frequencies?
Buried deep is a RoundTo
method and the rounding we want is RoundTo.NEAREST_HALF_PLUS_INFTY
. We need to deal with the complication of datetimes
, but again pandas already handles that; also import the round_nsint64
method.
from pandas._libs.tslibs.timestamps import RoundTo, round_nsint64
# rounded int64s
rounded = round_nsint64(t1.view('i8'), RoundTo.NEAREST_HALF_PLUS_INFTY, 'H')
# Convert back to datetime
pd.DatetimeIndex(rounded)
#DatetimeIndex(['2013-12-05 02:00:00', '2013-12-05 03:00:00',
# '2013-12-05 04:00:00', '2013-12-05 05:00:00'],
# dtype='datetime64[ns]', freq=None)
Upvotes: 2