danny
danny

Reputation: 67

Subtract columns with data of type date in python

I have the next example

import pandas as pd
    
data = {
    'id_number': ['000001000', '000009795', '000011722'],
    'first_date': ['2022-01-08 13:41:00', '2022-11-07 09:16:00', '2022-02-15 12:46:00'],
    'last_date': ['2023-06-13 16:33:00', '2022-11-30 12:59:00', '2022-06-21 11:42:00']
}
df_test = pd.DataFrame(data)

# Try to calculate the difference in months
df_test['first_date'] = pd.to_datetime(df_test['first_date'])
df_test['last_date'] = pd.to_datetime(df_test['last_date'])
df_test['months_difference'] = (df_test['last_date'] - df_test['first_date']) // pd.Timedelta('1 month')

df_test

But the error is

ValueError: invalid unit abbreviation: month

Is there any way to calculate that difference?

Upvotes: 0

Views: 97

Answers (3)

danny
danny

Reputation: 67

I found this solution:

    # Create the DataFrame
data = {
    'id_number': ['000001000', '000009795', '000011722'],
    'first_date': ['2022-01-08 13:41:00', '2022-11-07 09:16:00', '2022-02-15 12:46:00'],
    'last_date': ['2023-06-13 16:33:00', '2022-11-30 12:59:00', '2022-06-21 11:42:00']
}
df_test = pd.DataFrame(data)

# Convert date columns to datetime
df_test['first_date'] = pd.to_datetime(df_test['first_date'])
df_test['last_date'] = pd.to_datetime(df_test['last_date'])

# Calculate difference in months based on days
df_test['months_difference'] = ((df_test['last_date'] - df_test['first_date']).dt.days) // 30.44 
#30.44 is avg of days per month in a non-leap yer

or using this:

df_test['months_difference'] = df_test['last_date'].dt.to_period('M').astype('int64') - df_test['first_date'].dt.to_period('M').astype('int64')

Upvotes: 0

FObersteiner
FObersteiner

Reputation: 25634

another option: convert datetime to period "M" (months), then to integer (gives you the number of months since 1970), then do the subtraction:

df_test["last_date"].dt.to_period("M").astype(int)-df_test["first_date"].dt.to_period("M").astype(int)

0    17
1     0
2     4
dtype: int64

Upvotes: 1

Namrata Banerji
Namrata Banerji

Reputation: 77

You can try the following:

df_test['months_difference'] = (df_test.last_date -df_test.first_date)/numpy.timedelta64(1, 'M')

This will give you the months difference as a float, if you want them as integers, add

df_test['months_difference'] = df_test['months_difference'].astype(int)

Upvotes: 0

Related Questions