Reputation: 145
I would like to ask what the best 'Date' datatype is to use for the following problem:
I am reading timeseries data from an ASCII file and creating two numpy arrays; 1) date
, 2) data
. Once created, I would like to extract data
from a date range specified from a different ASCII file (start_time
, and duration
in seconds). These 'Data' would be extracted into another numpy array and written to an output file.
So far I have:
import numpy as np
date = ['2016-07-05 12:00:00.000000',
'2016-07-05 12:00:00.010000',
'2016-07-05 12:00:00.020000',
'2016-07-05 12:00:00.030000',
'2016-07-05 12:00:00.040000',
'2016-07-05 12:00:00.050000',
'2016-07-05 12:00:00.060000',
'2016-07-05 12:00:00.070000',
'2016-07-05 12:00:00.080000',
'2016-07-05 12:00:00.090000',
'2016-07-05 12:00:00.100000',
'2016-07-05 12:00:00.110000',
'2016-07-05 12:00:00.120000',
'2016-07-05 12:00:00.130000',
'2016-07-05 12:00:00.140000']
data = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
date = np.asarray(date, dtype=np.string_)
data = np.asarray(data, dtype=np.float32)
Then in order to extract from the desired date range I have:
start_time = '2016-07-05 12:00:00.030000'
duration = 0.10
a=[]
a=data[(date >= (start_time) & (date <= (start_time + duration))]
And then turn 'a' into an array for writing into another file:
a=np.asarray(a, dtype='float32')
However - just about every kind of dtype or datetime class I have tried for date
gives me some sort of error (e.g. object has no attribute
_getitem_
etc.,) or is not compatible with a list. What format would you use in this scenario? POSIX?
Thanks!
Upvotes: 1
Views: 5619
Reputation: 24945
I would use pandas for this. there is really good support for time series stuff, see the docs. You would probably want to use a time series index for more detailed work, here I am using it as a normal column.
Note, your data and date have a different number of rows, I've added a 0 to the data:
import pandas as pd
date = ['2016-07-05 12:00:00.000000',
'2016-07-05 12:00:00.010000',
'2016-07-05 12:00:00.020000',
'2016-07-05 12:00:00.030000',
'2016-07-05 12:00:00.040000',
'2016-07-05 12:00:00.050000',
'2016-07-05 12:00:00.060000',
'2016-07-05 12:00:00.070000',
'2016-07-05 12:00:00.080000',
'2016-07-05 12:00:00.090000',
'2016-07-05 12:00:00.100000',
'2016-07-05 12:00:00.110000',
'2016-07-05 12:00:00.120000',
'2016-07-05 12:00:00.130000',
'2016-07-05 12:00:00.140000']
data = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
x = pd.DataFrame({'date':date, 'data':data})
x.date = pd.to_datetime(x.date)
We now have a pandas dataframe with two columns, your date in datetime format, and your data.
Using the to_datetime function, we now have a datetime column, and can easily subset:
start_time = pd.to_datetime('2016-07-05 12:00:00.030000')
end_time = start_time + pd.DateOffset(seconds = 0.10) #or minutes etc etc
x[(x.date < end_time) & (x.date > start_time)]
giving:
data date
4 4 2016-07-05 12:00:00.040
5 5 2016-07-05 12:00:00.050
6 6 2016-07-05 12:00:00.060
7 7 2016-07-05 12:00:00.070
8 8 2016-07-05 12:00:00.080
9 9 2016-07-05 12:00:00.090
10 10 2016-07-05 12:00:00.100
11 11 2016-07-05 12:00:00.110
12 12 2016-07-05 12:00:00.120
Upvotes: 1