Reputation: 2119
I have two different columns in my dataset,
start end
0 2015-01-01 2017-01-01
1 2015-01-02 2015-06-02
2 2015-01-03 2015-12-03
3 2015-01-04 2020-11-25
4 2015-01-05 2025-07-27
I want the difference between start and end in a specific way, here's my desired output.
year_diff month_diff
2 1
0 6
0 12
5 11
10 7
Here the day is not important to me, only month and year. I've tried to period to get diff but it returns just different in months only. how can I achieve my desired output?
df['end'].dt.to_period('M') - df['start'].dt.to_period('M'))
Upvotes: 0
Views: 84
Reputation: 127
This solution assumes that the number of days that make up a year (365) and a month (30) are constant. If the datetimes are strings, convert them into a datetime object. In a Pandas DataFrame this can be done like so
def to_datetime(dataframe):
new_dataframe = pd.DataFrame()
new_dataframe[0] = pd.to_datetime(dataframe[0], format="%Y-%m-%d")
new_dataframe[1] = pd.to_datetime(dataframe[1], format="%Y-%m-%d")
return new_dataframe
Next, column 1 can be subtracted from column 0 to give the difference in days. We can divide this number by 365 using the //
operator to get the number of whole years. We can get the number of remaining days using the %
operator and divide this by 30 using the //
operator the get the number of whole months.
def get_time_diff(dataframe):
dataframe[2] = dataframe[1] - dataframe[0]
diff_dataframe = pd.DataFrame(columns=["year_diff", "month_diff"])
for i in range(0, dataframe.index.stop):
year_diff = dataframe[2][i].days // 365
month_diff = (dataframe[2][i].days % 365) // 30
diff_dataframe.loc[i] = [year_diff, month_diff]
return diff_dataframe
An example output from using these functions would be
start end days_diff year_diff month_diff
0 2019-10-15 2021-08-11 666 days 1 10
1 2020-02-11 2022-10-13 975 days 2 8
2 2018-12-17 2020-09-16 639 days 1 9
3 2017-01-03 2017-01-28 25 days 0 0
4 2019-12-21 2022-03-10 810 days 2 2
5 2018-08-08 2019-05-07 272 days 0 9
6 2017-06-18 2020-08-01 1140 days 3 1
7 2017-11-14 2020-04-17 885 days 2 5
8 2019-08-19 2020-05-10 265 days 0 8
9 2018-05-05 2020-09-08 857 days 2 4
Note: This will give the number of whole years and months. Hence, if there is a remainder of 29 days, one day short from a month, this will not be counted.
Upvotes: 0
Reputation: 13387
Try:
df["year_diff"]=df["end"].dt.year.sub(df["start"].df.year)
df["month_diff"]=df["end"].dt.month.sub(df["start"].df.month)
Upvotes: 2