Reputation: 1987
I am attempting to add a year to a column of dates in a pandas dataframe, but when I use pd.to_timedelta
I get additional hours & minutes. I know I could take the updated time and truncate the hours, but I feel like there must be a way to add a year precisely. My attempt as follows:
import pandas as pd
dates = pd.DataFrame({'date':['20170101','20170102','20170103']})
dates['date'] = pd.to_datetime(dates['date'], format='%Y%m%d')
dates['date2'] = dates['date'] + pd.to_timedelta(1, unit='y')
dates
yields:
Out[1]:
date date2
0 2017-01-01 2018-01-01 05:49:12
1 2017-01-02 2018-01-02 05:49:12
2 2017-01-03 2018-01-03 05:49:12
How can I add a year without adding 05:49:12 HH:mm:ss?
Upvotes: 23
Views: 40597
Reputation: 1
You can access the components of a date (year, month and day) using code of the form dataframe["column"].dt.component
.
For example, the month component is dataframe["column"].dt.month
, and the year component is dataframe["column"].dt.year
.
Upvotes: 0
Reputation: 5165
Edit: This works if you don't care about leap years, etc. Otherwise see jp_data_analysis's answer.
You can use 365 and unit='d'
:
pd.to_timedelta(365, unit='d')
Upvotes: 0
Reputation: 1022
Or convert datetime
to date
dates['date'] = dates['date'].apply(lambda a: a.date())
Upvotes: 2
Reputation: 210972
In [99]: dates['date'] + pd.offsets.DateOffset(years=1)
Out[99]:
0 2018-01-01
1 2018-01-02
2 2018-01-03
Name: date, dtype: datetime64[ns]
leap year check:
In [100]: pd.to_datetime(['2011-02-28', '2012-02-29']) + pd.offsets.DateOffset(years=1)
Out[100]: DatetimeIndex(['2012-02-28', '2013-02-28'], dtype='datetime64[ns]', freq=None)
Upvotes: 41
Reputation: 164823
You can normalize via pd.Series.dt.normalize
:
dates['date2'] = (dates['date'] + pd.to_timedelta(1, unit='y')).dt.normalize()
Upvotes: 4