P3P0
P3P0

Reputation: 165

Calculate age between two integer columns

I have the below df:

OnlineDate    BDate
20190813      19720116
20190809      19570912
20190807      19600601
20190801      19760919
20190816      19530916

The two columns are integer and are dates YYYYMMDD

I'm trying to get a new column that is the result in Years between these two dates.

So, the expected output is the next

OnlineDate    BDate       NewColumn
20190813      19720116       47
20190809      19570912       61
20190807      19600601       59
20190801      19760919       51
20190816      19530916       66

I can't just subtract the years because the days and months count to determine the year,

Do I have to create a function to do it or I can do it without one?

Upvotes: 0

Views: 197

Answers (2)

Simon
Simon

Reputation: 1211

It requires some setup, but you're looking to convert your columns to a datetime, get the year from it and then simply subtract them to get the differences

import pandas as pd
import numpy as np

# setup
onlinedate = [20190813, 20190809, 20190807, 20190801, 20190816]
bdate = [19720116, 19570912, 19600601, 19760919, 19530916]

df = pd.DataFrame({"onlinedate":onlinedate, "bdate":bdate})

# convert to dates
onlinedate_year = pd.to_datetime(df["onlinedate"], format="%Y%M%d")
bdate_year = pd.to_datetime(df["bdate"], format="%Y%M%d")
# Setup new column, columnwise operation
# Subtract the two dates and divide by years
df["NewColumn"] = ((onlinedate_year - bdate_year)/np.timedelta64(1,'Y'))
# convert the float column in to int
df["NewColumn"] = df["NewColumn"].astype(int)


print(df)

output:

   onlinedate     bdate  NewColumn
0    20190813  19720116         46
1    20190809  19570912         61
2    20190807  19600601         59
3    20190801  19760919         42
4    20190816  19530916         65

Upvotes: 1

Boskosnitch
Boskosnitch

Reputation: 774

Convert the data types to datetime;

for col in ['OnlineDate','BDate']:
    df[col]=pd.to_datetime(df[col],format="%Y%m%d")

Subtract the years;

df['NewColumn']=df['OnlineDate'].dt.year-df['BDate'].dt.year

Upvotes: 1

Related Questions