Reputation: 2071
I have a DataFrame with about 20 million rows and a DatetimeIndex. There are data from different years, and I would like to assign them all to the same year by changing the timestamps. The statements below accomplish this, but are a bit slower than I would like (double-digit seconds).
new_index = [ts.replace(year=2014) for ts in df.index]
df.index = new_index
The data are not evenly spaced, so I cannot generate a new index easily. Is there a better way?
Upvotes: 4
Views: 1076
Reputation: 9941
Try with:
%%time
new_index = pd.to_datetime({
'year': 2014,
'month': df.index.month,
'day': df.index.day})
CPU times: user 333 ms, sys: 34.4 ms, total: 367 ms
Wall time: 346 ms
Compared to the original:
%%time
new_index = [ts.replace(year=2014) for ts in df.index]
CPU times: user 6.97 s, sys: 115 ms, total: 7.08 s
Wall time: 7.1 s
The timings are for 1M dataset, but I would expect a similar improvement for 20M.
Also, of course, if hours/minutes/seconds are to be preserved, they should be added to to_datetime
as well.
Upvotes: 5
Reputation: 786
You can try,
df.index = pd.DatetimeIndex(df.index)
df.index = df.index + pd.DateOffset(year=2016)
Upvotes: 0
Reputation: 455
Please try below:
import datetime as dt
df.index = df.index.to_series().apply(lambda x: dt.datetime.strftime(x, '2016-%m-%d %H:%M:%S')).tolist()
In above example, I am trying change year to 2016 and output is as below:
df
open high low close
2016-01-02 09:08:00 116.00 116.00 116.00 116.00
2016-01-02 09:16:00 116.10 117.80 117.00 113.00
2016-01-03 09:07:00 115.50 116.20 115.50 116.20
2016-01-02 09:19:00 116.00 116.00 115.60 115.75
2016-01-02 09:18:00 116.05 116.35 116.00 116.00
Upvotes: 0