Reputation: 165
I have the below df:
OnlineDate BDate
20190813 19720116
20190809 19570912
20190807 19600601
20190801 19760919
20190816 19530916
The two columns are integer and are dates YYYYMMDD
I'm trying to get a new column that is the result in Years between these two dates.
So, the expected output is the next
OnlineDate BDate NewColumn
20190813 19720116 47
20190809 19570912 61
20190807 19600601 59
20190801 19760919 51
20190816 19530916 66
I can't just subtract the years because the days and months count to determine the year,
Do I have to create a function to do it or I can do it without one?
Upvotes: 0
Views: 197
Reputation: 1211
It requires some setup, but you're looking to convert your columns to a datetime, get the year from it and then simply subtract them to get the differences
import pandas as pd
import numpy as np
# setup
onlinedate = [20190813, 20190809, 20190807, 20190801, 20190816]
bdate = [19720116, 19570912, 19600601, 19760919, 19530916]
df = pd.DataFrame({"onlinedate":onlinedate, "bdate":bdate})
# convert to dates
onlinedate_year = pd.to_datetime(df["onlinedate"], format="%Y%M%d")
bdate_year = pd.to_datetime(df["bdate"], format="%Y%M%d")
# Setup new column, columnwise operation
# Subtract the two dates and divide by years
df["NewColumn"] = ((onlinedate_year - bdate_year)/np.timedelta64(1,'Y'))
# convert the float column in to int
df["NewColumn"] = df["NewColumn"].astype(int)
print(df)
output:
onlinedate bdate NewColumn
0 20190813 19720116 46
1 20190809 19570912 61
2 20190807 19600601 59
3 20190801 19760919 42
4 20190816 19530916 65
Upvotes: 1
Reputation: 774
Convert the data types to datetime;
for col in ['OnlineDate','BDate']:
df[col]=pd.to_datetime(df[col],format="%Y%m%d")
Subtract the years;
df['NewColumn']=df['OnlineDate'].dt.year-df['BDate'].dt.year
Upvotes: 1