Reputation: 3938
I am trying to have a vector of seconds between two time intervals:
import numpy as np
import pandas as pd
date="2011-01-10"
start=np.datetime64(date+'T09:30:00')
end=np.datetime64(date+'T16:00:00')
range = pd.date_range(start, end, freq='S')
For some reason when I print range
I get:
[2011-01-10 17:30:00, ..., 2011-01-11 00:00:00]
So the length is 23401 which is what I want but definitely not the correct time interval. Why is that?
Also, if I have a DataFrame df
with a column of datetime64 format that looks like:
Time
15:59:57.887529007
15:59:57.805383290
Once I solved the problem above, will I be able to do the following:
data = df.reindex(df.Time + range) data = data.ffill()
??
I need to do the exact steps proposed here except with datetime64
format. Is it possible?
Upvotes: 3
Views: 648
Reputation: 61
Is it because when you specify the datetime as a string, numpy assumes it is in localtime and converts it to UTC.
Specifying the time offset gives the correct interval though the interval is in UTC
start=np.datetime64(date+'T09:30:00+0000')
end=np.datetime64(date+'T16:00:00+0000')
range=pd.date_range(start,end,freq='S')
Or using a datetime.datetime object as the start and end and again the interval here is in UTC
import datetime
start = datetime.datetime(2011, 1, 10, 9, 30, 0)
end = datetime.datetime(2011, 1, 10, 16, 0, 0)
range=pd.date_range(start,end,freq='S')
Upvotes: 2
Reputation: 375675
It seems that pandas date_range
is dropping the timezone (looks like a bug, I think it's already filed...), you can use Timestamp rather than datetime64 to workaround this:
In [11]: start = pd.Timestamp(date+'T09:30:00')
In [12]: end = pd.Timestamp(date+'T16:00:00')
In [13]: pd.date_range(start, end, freq='S')
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-10 09:30:00, ..., 2011-01-10 16:00:00]
Length: 23401, Freq: S, Timezone: None
Note: To see it's a timezone, you're in UTC-8, and 14:00 + 8:00 == 00:00 (the next day).
Upvotes: 3