Reputation: 2508
I want to extract data for years in the table below. Data are in the format datetime64
and this data you can see below
import numpy as np
import pandas as pd
data = {
'Date': ['2021-01-01','2020-01-01','2019-01-01','2028-01-01']
}
df = pd.DataFrame(data, columns = ['Date'
])
df['Date']= df['Date'].astype('datetime64')
df
Now I want to subtract this data from 2023-01-01 in order to calculate how many years have been between this date and the data in the table . To do this I tried with the row below but unfortunately this did not work
df['Date']-['2023-01-01'].astype('datetime64')
So can anybody help me with how to solve this problem?
Upvotes: 2
Views: 68
Reputation: 11
You can try to convert your dates in datetime
format and the apply the subtraction for the column Date
:
def years_to_2023(date):
diff = datetime.datetime(2023,1,1,0,0,0) - datetime.datetime.strptime(date, "%Y-%m-%d")
return int(diff.days/365)
df['Date'].apply(lambda date: years_to_2023(date))
Remember to:
import datetime
The output should be:
0 2 #2021
1 3 #2020
2 4 #2019
3 -5 #2028
Name: Date, dtype: int64
Upvotes: 1
Reputation: 260335
You need to convert the reference to Timestamp
:
df['days'] = df['Date'] - pd.Timestamp('2023-01-01')
# or
df['days'] = df['Date'].sub(pd.Timestamp('2023-01-01'))
Output:
Date days
0 2021-01-01 -730 days
1 2020-01-01 -1096 days
2 2019-01-01 -1461 days
3 2028-01-01 1826 days
For the reversed operation:
df['days'] = pd.Timestamp('2023-01-01') - df['Date']
# or
df['days'] = df['Date'].rsub(pd.Timestamp('2023-01-01'))
Output:
Date days
0 2021-01-01 730 days
1 2020-01-01 1096 days
2 2019-01-01 1461 days
3 2028-01-01 -1826 days
Note that a preferred method to convert the Dates is to_datetime
.
For years, you can further divide by np.timedelta64(1, 'Y')
:
df['years'] = df['Date'].sub(pd.Timestamp('2023-01-01')).div(np.timedelta64(1, 'Y'))
Output:
Date years
0 2021-01-01 -1.998672
1 2020-01-01 -3.000746
2 2019-01-01 -4.000082
3 2028-01-01 4.999418
Upvotes: 1