Reputation: 337
dta_h is is a DataFrame and dta_h.Datetime looks like this:
0 2013-03-01 00:00:00
1 2013-02-28 23:00:00
2 2013-02-28 22:00:00
3 2013-02-28 21:00:00
...
Name: Datetime, Length: 63001, dtype: datetime64[ns]
Until recently (I'll explain later what this means) I could do this to subtract one hours of each time period:
dta_h.Datetime-np.timedelta(hours=1)
But now, if I do the above, I am getting this:
0 2013-03-01 00:11:34.967296
1 2013-02-28 23:11:34.967296
2 2013-02-28 22:11:34.967296
3 2013-02-28 21:11:34.967296
...
Which clearly is not what I want. However, this:
[i-timedelta(hours=1) for i in dta_h.Datetime ]
still yields the desirted result:
0 2013-02-28 23:00:00
1 2013-02-28 22:00:00
2 2013-02-28 21:00:00
3 2013-02-28 20:00:00
....
Length: 63001, dtype: datetime64[ns]
I am 99% sure that this problem started when I upgraded to Pandas 0.11. I have been looking around in the documentation for any difference in the version that might explain it without success. I also found this posting:
pandas handling of numpy timedelta64[ms]
which refers to this Pandas issue
https://github.com/pydata/pandas/issues/3009
Based on what I read there, I tried:
dta_h.Datetime-np.timedelta64(hours=1)
But this actually does nothing:
0 2013-03-01 00:00:00
1 2013-02-28 23:00:00
2 2013-02-28 22:00:00
3 2013-02-28 21:00:00
...
Any idea why 1) the df-np.timedelta stopped working, and 2) why the comprehension list version still works? Thanks for you help.
FYI, I am using Numpy 1.6.2 and, a I said earlier, recently upgraded from Pandas 0.9 to 0.11
Upvotes: 3
Views: 1880
Reputation: 129048
Numpy is quite buggy in 1.6.2/1 for timedeltas. It works for intervals < 30minutes (I have no idea why). Best bet is to upgrade to numpy 1.7.0/1 much more stable, and use datetime.timedelta
In [33]: df = DataFrame(dict(date = [Timestamp('20130301'),Timestamp('20130228 23:00:00'),Timestamp('20130228 22:00:00'),Timestamp('20130228 21:00:00')]))
In [34]: df
Out[34]:
date
0 2013-03-01 00:00:00
1 2013-02-28 23:00:00
2 2013-02-28 22:00:00
3 2013-02-28 21:00:00
In [37]: df['date'] + timedelta(hours=1)
Out[37]:
0 2013-03-01 01:00:00
1 2013-03-01 00:00:00
2 2013-02-28 23:00:00
3 2013-02-28 22:00:00
Name: date, dtype: datetime64[ns]
In [38]: np.__version__
Out[38]: '1.7.1'
Upvotes: 2
Reputation: 375865
You can use the time in nanoseconds:
In [11]: df - pd.np.timedelta64(60*60*10**9) # one hour in nanoseconds
Out[11]:
date
index
0 2013-02-28 23:00:00
1 2013-02-28 22:00:00
2 2013-02-28 21:00:00
3 2013-02-28 20:00:00
Keyword arguments appear to be ignored by timedelta64:
In [12]: df - pd.np.timedelta64(foo=60*60*10**9)
Out[12]:
date
index
0 2013-03-01 00:00:00
1 2013-02-28 23:00:00
2 2013-02-28 22:00:00
3 2013-02-28 21:00:00
It feels like you ought to be able to use pandas offsets:
df.date - pd.offsets.Hour(1)
ValueError: cannot operate on a series with out a rhs of a series/ndarray of type datetime64[ns] or a timedelta
At the moment you can do this with an apply or the delta attribute:
In [21]: df.date.apply(lambda t: t - pd.offsets.Hour(1))
Out[21]:
index
0 2013-02-28 23:00:00
1 2013-02-28 22:00:00
2 2013-02-28 21:00:00
3 2013-02-28 20:00:00
Name: date, dtype: datetime64[ns]
In [22]: df.date - pd.offsets.Hour(1).delta
Out[22]:
index
0 2013-02-28 23:00:00
1 2013-02-28 22:00:00
2 2013-02-28 21:00:00
3 2013-02-28 20:00:00
Name: date, dtype: datetime64[ns]
Upvotes: 1