silent_hunter
silent_hunter

Reputation: 2508

Extracting years from data

I want to extract data for years in the table below. Data are in the format datetime64 and this data you can see below

import numpy as np
import pandas as pd

data = {
         'Date': ['2021-01-01','2020-01-01','2019-01-01','2028-01-01']
        }

df = pd.DataFrame(data, columns = ['Date'
                                   ])
df['Date']= df['Date'].astype('datetime64')

df

Now I want to subtract this data from 2023-01-01 in order to calculate how many years have been between this date and the data in the table . To do this I tried with the row below but unfortunately this did not work

df['Date']-['2023-01-01'].astype('datetime64')

So can anybody help me with how to solve this problem?

Upvotes: 2

Views: 68

Answers (2)

stepzar
stepzar

Reputation: 11

You can try to convert your dates in datetime format and the apply the subtraction for the column Date:

def years_to_2023(date):
  diff = datetime.datetime(2023,1,1,0,0,0) - datetime.datetime.strptime(date, "%Y-%m-%d")
  return int(diff.days/365)

df['Date'].apply(lambda date: years_to_2023(date))

Remember to:

import datetime

The output should be:

0    2  #2021
1    3  #2020
2    4  #2019
3   -5  #2028
Name: Date, dtype: int64

Upvotes: 1

mozway
mozway

Reputation: 260335

You need to convert the reference to Timestamp:

df['days'] = df['Date'] - pd.Timestamp('2023-01-01')

# or
df['days'] = df['Date'].sub(pd.Timestamp('2023-01-01'))

Output:

        Date       days
0 2021-01-01  -730 days
1 2020-01-01 -1096 days
2 2019-01-01 -1461 days
3 2028-01-01  1826 days

For the reversed operation:

df['days'] = pd.Timestamp('2023-01-01') - df['Date']

# or
df['days'] = df['Date'].rsub(pd.Timestamp('2023-01-01'))

Output:

        Date       days
0 2021-01-01   730 days
1 2020-01-01  1096 days
2 2019-01-01  1461 days
3 2028-01-01 -1826 days

Note that a preferred method to convert the Dates is to_datetime.

For years, you can further divide by np.timedelta64(1, 'Y'):

df['years'] = df['Date'].sub(pd.Timestamp('2023-01-01')).div(np.timedelta64(1, 'Y'))

Output:

        Date     years
0 2021-01-01 -1.998672
1 2020-01-01 -3.000746
2 2019-01-01 -4.000082
3 2028-01-01  4.999418

Upvotes: 1

Related Questions