user_12
user_12

Reputation: 2119

How to get difference of time between two datetime columns pandas?

I have two different columns in my dataset,

       start    end
0   2015-01-01  2017-01-01
1   2015-01-02  2015-06-02
2   2015-01-03  2015-12-03
3   2015-01-04  2020-11-25
4   2015-01-05  2025-07-27

I want the difference between start and end in a specific way, here's my desired output.

year_diff  month_diff
        2           1
        0           6
        0          12
        5          11
       10           7

Here the day is not important to me, only month and year. I've tried to period to get diff but it returns just different in months only. how can I achieve my desired output?

df['end'].dt.to_period('M') - df['start'].dt.to_period('M'))

Upvotes: 0

Views: 84

Answers (2)

William Hicklin
William Hicklin

Reputation: 127

This solution assumes that the number of days that make up a year (365) and a month (30) are constant. If the datetimes are strings, convert them into a datetime object. In a Pandas DataFrame this can be done like so

def to_datetime(dataframe):
    new_dataframe = pd.DataFrame()
    new_dataframe[0] = pd.to_datetime(dataframe[0], format="%Y-%m-%d")
    new_dataframe[1] = pd.to_datetime(dataframe[1], format="%Y-%m-%d")
    return new_dataframe

Next, column 1 can be subtracted from column 0 to give the difference in days. We can divide this number by 365 using the // operator to get the number of whole years. We can get the number of remaining days using the % operator and divide this by 30 using the // operator the get the number of whole months.

def get_time_diff(dataframe):
    dataframe[2] = dataframe[1] - dataframe[0]
    diff_dataframe = pd.DataFrame(columns=["year_diff", "month_diff"])
    for i in range(0, dataframe.index.stop):
        year_diff = dataframe[2][i].days // 365
        month_diff = (dataframe[2][i].days % 365) // 30
        diff_dataframe.loc[i] = [year_diff, month_diff]

    return diff_dataframe

An example output from using these functions would be

       start        end days_diff year_diff month_diff
0 2019-10-15 2021-08-11  666 days         1         10
1 2020-02-11 2022-10-13  975 days         2          8
2 2018-12-17 2020-09-16  639 days         1          9
3 2017-01-03 2017-01-28   25 days         0          0
4 2019-12-21 2022-03-10  810 days         2          2
5 2018-08-08 2019-05-07  272 days         0          9
6 2017-06-18 2020-08-01 1140 days         3          1
7 2017-11-14 2020-04-17  885 days         2          5
8 2019-08-19 2020-05-10  265 days         0          8
9 2018-05-05 2020-09-08  857 days         2          4

Note: This will give the number of whole years and months. Hence, if there is a remainder of 29 days, one day short from a month, this will not be counted.

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try:

df["year_diff"]=df["end"].dt.year.sub(df["start"].df.year)
df["month_diff"]=df["end"].dt.month.sub(df["start"].df.month)

Upvotes: 2

Related Questions