Reputation: 67
I have the next example
import pandas as pd
data = {
'id_number': ['000001000', '000009795', '000011722'],
'first_date': ['2022-01-08 13:41:00', '2022-11-07 09:16:00', '2022-02-15 12:46:00'],
'last_date': ['2023-06-13 16:33:00', '2022-11-30 12:59:00', '2022-06-21 11:42:00']
}
df_test = pd.DataFrame(data)
# Try to calculate the difference in months
df_test['first_date'] = pd.to_datetime(df_test['first_date'])
df_test['last_date'] = pd.to_datetime(df_test['last_date'])
df_test['months_difference'] = (df_test['last_date'] - df_test['first_date']) // pd.Timedelta('1 month')
df_test
But the error is
ValueError: invalid unit abbreviation: month
Is there any way to calculate that difference?
Upvotes: 0
Views: 97
Reputation: 67
I found this solution:
# Create the DataFrame
data = {
'id_number': ['000001000', '000009795', '000011722'],
'first_date': ['2022-01-08 13:41:00', '2022-11-07 09:16:00', '2022-02-15 12:46:00'],
'last_date': ['2023-06-13 16:33:00', '2022-11-30 12:59:00', '2022-06-21 11:42:00']
}
df_test = pd.DataFrame(data)
# Convert date columns to datetime
df_test['first_date'] = pd.to_datetime(df_test['first_date'])
df_test['last_date'] = pd.to_datetime(df_test['last_date'])
# Calculate difference in months based on days
df_test['months_difference'] = ((df_test['last_date'] - df_test['first_date']).dt.days) // 30.44
#30.44 is avg of days per month in a non-leap yer
or using this:
df_test['months_difference'] = df_test['last_date'].dt.to_period('M').astype('int64') - df_test['first_date'].dt.to_period('M').astype('int64')
Upvotes: 0
Reputation: 25634
another option: convert datetime to period "M" (months), then to integer (gives you the number of months since 1970), then do the subtraction:
df_test["last_date"].dt.to_period("M").astype(int)-df_test["first_date"].dt.to_period("M").astype(int)
0 17
1 0
2 4
dtype: int64
Upvotes: 1
Reputation: 77
You can try the following:
df_test['months_difference'] = (df_test.last_date -df_test.first_date)/numpy.timedelta64(1, 'M')
This will give you the months difference as a float, if you want them as integers, add
df_test['months_difference'] = df_test['months_difference'].astype(int)
Upvotes: 0