yuez
yuez

Reputation: 915

efficient way to make pandas.DatetimeIndex in Python

I am downloading intraday 1 minute prices for over 3000 stocks from a database. each record will have three components:

(time, symbol, price)

I would like to generate a pandas.DataFrame where row index is time and column index is symbol. my strategy is to first build a double indexed series and then convert it into a DataFrame. The conversion is fast. However, changing time (epoch time) to pandas.DatetimeIndex is very very slow, it usually takes about 15minutes because I first have to convert time to a string like "2013-09-23T09:45:00" and then use pandas.DatetimeIndex function.

Is there a faster way for doing this?

Upvotes: 2

Views: 333

Answers (1)

Jeff
Jeff

Reputation: 129018

see docs

In [1]: stamps = [ Timestamp('20130101 09:01:01'),Timestamp('20130102 09:10:00') ]

In [2]: stamps
Out[2]: 
[Timestamp('2013-01-01 09:01:01', tz=None),
 Timestamp('2013-01-02 09:10:00', tz=None)]

In [6]: ms_epoch = [ int(s.value/1e6) for s in stamps ]

In [7]: ms_epoch
Out[7]: [1357030861000, 1357117800000]

The conversion, specify unit='ms' for milliseond epochs

In [8]: pd.to_datetime(ms_epoch,unit='ms')
Out[8]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:01:01, 2013-01-02 09:10:00]
Length: 2, Freq: None, Timezone: None

Convert 1M stamps efficiently

In [21]: ms_epoch = np.repeat(base,int(1e7)) + np.arange(int(1e7))

In [22]: len(ms_epoch)
Out[22]: 10000000

In [26]: %timeit pd.to_datetime(ms_epoch,unit='ms')
1 loops, best of 3: 1.04 s per loop

In [27]: pd.to_datetime(ms_epoch,unit='ms')[0:5]
Out[27]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:00:01, ..., 2013-01-01 09:00:01.004000]
Length: 5, Freq: None, Timezone: None

In [28]: pd.to_datetime(ms_epoch,unit='ms')[-5:]
Out[28]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 11:46:40.995000, ..., 2013-01-01 11:46:40.999000]
Length: 5, Freq: None, Timezone: None

Upvotes: 1

Related Questions