Reputation: 915
I am downloading intraday 1 minute prices for over 3000 stocks from a database. each record will have three components:
(time, symbol, price)
I would like to generate a pandas.DataFrame where row index is time and column index is symbol. my strategy is to first build a double indexed series and then convert it into a DataFrame. The conversion is fast. However, changing time (epoch time) to pandas.DatetimeIndex is very very slow, it usually takes about 15minutes because I first have to convert time to a string like "2013-09-23T09:45:00" and then use pandas.DatetimeIndex function.
Is there a faster way for doing this?
Upvotes: 2
Views: 333
Reputation: 129018
see docs
In [1]: stamps = [ Timestamp('20130101 09:01:01'),Timestamp('20130102 09:10:00') ]
In [2]: stamps
Out[2]:
[Timestamp('2013-01-01 09:01:01', tz=None),
Timestamp('2013-01-02 09:10:00', tz=None)]
In [6]: ms_epoch = [ int(s.value/1e6) for s in stamps ]
In [7]: ms_epoch
Out[7]: [1357030861000, 1357117800000]
The conversion, specify unit='ms'
for milliseond epochs
In [8]: pd.to_datetime(ms_epoch,unit='ms')
Out[8]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:01:01, 2013-01-02 09:10:00]
Length: 2, Freq: None, Timezone: None
Convert 1M stamps efficiently
In [21]: ms_epoch = np.repeat(base,int(1e7)) + np.arange(int(1e7))
In [22]: len(ms_epoch)
Out[22]: 10000000
In [26]: %timeit pd.to_datetime(ms_epoch,unit='ms')
1 loops, best of 3: 1.04 s per loop
In [27]: pd.to_datetime(ms_epoch,unit='ms')[0:5]
Out[27]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:00:01, ..., 2013-01-01 09:00:01.004000]
Length: 5, Freq: None, Timezone: None
In [28]: pd.to_datetime(ms_epoch,unit='ms')[-5:]
Out[28]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 11:46:40.995000, ..., 2013-01-01 11:46:40.999000]
Length: 5, Freq: None, Timezone: None
Upvotes: 1