Reputation: 1155
I am trying to get the difference between a pandas dataframe column and a datetime
object by using a customized function (years_between
), here's how pandas dataframe looks like:
input_1['dataadmissao'].head(5)
0 2018-02-10
1 2009-08-23
2 2015-05-21
3 2016-12-17
4 2019-02-01
Name: dataadmissao, dtype: datetime64[ns]
And here's my code:
###################### function to return difference in years ####################
def years_between(start_year, end_year):
start_year = datetime.strptime(start_year, "%d/%m/%Y")
end_year = datetime.strptime(end_year, "%d/%m/%Y")
return abs(end_year.year - start_year.year)
input_1['difference_in_years'] = np.vectorize(years_between(input_1['dataadmissao'], datetime.now()))
Which returns:
TypeError: strptime() argument 1 must be str, not Series
How could I adjust the function to return a integer which represents the difference in years between pandas dataframe column and datetime.now()
?
Upvotes: 1
Views: 173
Reputation: 57033
Simply subtract the series from datetime.datetime.now()
, divide by the duration of one year, and convert to an integer:
import numpy as np
((datetime.now() - input_1['dataadmissao'])/np.timedelta64(1, 'Y')).astype(int)
Upvotes: 1
Reputation: 15872
Use pandas.Timestamp.now
:
>>> df
0 2018-02-10
1 2009-08-23
2 2015-05-21
3 2016-12-17
4 2019-02-01
Name: 1, dtype: datetime64[ns]
>>> pd.Timestamp.now() - df
0 1089 days 02:41:50.467993
1 4182 days 02:41:50.467993
2 2085 days 02:41:50.467993
3 1509 days 02:41:50.467993
4 733 days 02:41:50.467993
Name: 1, dtype: timedelta64[ns]
# If you want days
>>> (pd.Timestamp.now() - df).dt.days
0 1089
1 4182
2 2085
3 1509
4 733
Name: 1, dtype: int64
# If you want years
>>> (pd.Timestamp.now().year - df.dt.year)
0 3
1 12
2 6
3 5
4 2
Name: 1, dtype: int64
Upvotes: 1