adr
adr

Reputation: 2071

Looking for a faster way to replace year in pandas DatetimeIndex

I have a DataFrame with about 20 million rows and a DatetimeIndex. There are data from different years, and I would like to assign them all to the same year by changing the timestamps. The statements below accomplish this, but are a bit slower than I would like (double-digit seconds).

new_index = [ts.replace(year=2014) for ts in df.index]
df.index = new_index

The data are not evenly spaced, so I cannot generate a new index easily. Is there a better way?

Upvotes: 4

Views: 1076

Answers (3)

perl
perl

Reputation: 9941

Try with:

%%time
new_index = pd.to_datetime({
    'year': 2014,
    'month': df.index.month,
    'day': df.index.day})

CPU times: user 333 ms, sys: 34.4 ms, total: 367 ms
Wall time: 346 ms

Compared to the original:

%%time
new_index = [ts.replace(year=2014) for ts in df.index]

CPU times: user 6.97 s, sys: 115 ms, total: 7.08 s
Wall time: 7.1 s

The timings are for 1M dataset, but I would expect a similar improvement for 20M.

Also, of course, if hours/minutes/seconds are to be preserved, they should be added to to_datetime as well.

Upvotes: 5

Rajesh
Rajesh

Reputation: 786

You can try,

df.index = pd.DatetimeIndex(df.index)
df.index = df.index + pd.DateOffset(year=2016)

Upvotes: 0

Dhiraj Bansal
Dhiraj Bansal

Reputation: 455

Please try below:

import datetime as dt
df.index = df.index.to_series().apply(lambda x: dt.datetime.strftime(x, '2016-%m-%d %H:%M:%S')).tolist()

In above example, I am trying change year to 2016 and output is as below:

df

    open    high    low     close
2016-01-02 09:08:00     116.00  116.00  116.00  116.00
2016-01-02 09:16:00     116.10  117.80  117.00  113.00
2016-01-03 09:07:00     115.50  116.20  115.50  116.20
2016-01-02 09:19:00     116.00  116.00  115.60  115.75
2016-01-02 09:18:00     116.05  116.35  116.00  116.00

Upvotes: 0

Related Questions