Reputation: 69
I have a big data frame (the fragment is below):
start_date finish_date
2842 2019-02-16 19:35:55.125766+00:00 2019-06-23 08:10:42.867492+00:00
2844 2019-05-29 18:03:54.230822+00:00 2019-06-05 08:06:37.896891+00:00
2846 2019-03-26 10:29:14.626280+00:00 2019-03-28 03:00:12.350836+00:00
2847 2019-04-22 16:29:30.480639+00:00 2019-04-24 18:02:09.869749+00:00
2852 2019-06-28 11:32:32.104132+00:00 2019-07-07 20:15:47.000026+00:00
2853 2019-03-21 17:20:50.030024+00:00 2019-03-27 03:18:26.652882+00:00
2854 2019-07-12 13:46:24.119986+00:00 2019-09-16 14:36:16.995393+00:00
start_date and finish_date are datetime64 format.
I need to create a new column with the result of calculation of how many months between start_date
and finish_date
.
for each string I used
len(pd.date_range(start=df.loc[2844, 'start_date'], end=df.loc[2844, 'finish_date'], freq='M'))
But I dont know how to apply this to every row ... row by row. I guess some lambda must be used...
This:
df['length'] = pd.date_range(start=df['start_date'], end=df['finish_date'], freq='M')
rises an error...
expected result:
start_date finish_date length
2842 2019-02-16 19:35:55.125766+00:00 2019-06-23 08:10:42.867492+00:00 4
2844 2019-05-29 18:03:54.230822+00:00 2019-06-05 08:06:37.896891+00:00 1
2846 2019-03-26 10:29:14.626280+00:00 2019-03-28 03:00:12.350836+00:00 0
2847 2019-04-22 16:29:30.480639+00:00 2019-04-24 18:02:09.869749+00:00 0
2852 2019-06-28 11:32:32.104132+00:00 2019-07-07 20:15:47.000026+00:00 1
2853 2019-03-21 17:20:50.030024+00:00 2019-03-27 03:18:26.652882+00:00 0
2854 2019-07-12 13:46:24.119986+00:00 2019-09-16 14:36:16.995393+00:00 2
Upvotes: 1
Views: 368
Reputation: 24304
Since both dates are of dtype datetime you can calculate the difference between months by using Series.dt.month
attribute:
df['length']=(df['finish_date'].dt.month-df['start_date'].dt.month).abs()
Upvotes: 2
Reputation: 1005
Finding a Difference between dates for month can cause issue with rounding so I have given both the result for your uderstanding:
import pandas as pd
data = {
"start_date" :["2019-02-16 19:35:55.125766+00:00", "2019-05-29 18:03:54.230822+00:00", "2019-03-26 10:29:14.626280+00:00", "2019-04-22 16:29:30.480639+00:00", "2019-06-28 11:32:32.104132+00:00", "2019-03-21 17:20:50.030024+00:00", "2019-07-12 13:46:24.119986+00:00"],
"finish_date" : ["2019-06-23 08:10:42.867492+00:00", "2019-06-05 08:06:37.896891+00:00", "2019-03-28 03:00:12.350836+00:00", "2019-04-24 18:02:09.869749+00:00", "2019-07-07 20:15:47.000026+00:00", "2019-03-27 03:18:26.652882+00:00", "2019-09-16 14:36:16.995393+00:00"]
}
df = pd.DataFrame(data)
# you can skip this if already in datetime format
df['start_date'] = pd.to_datetime(df['start_date'])
df['finish_date'] = pd.to_datetime(df['finish_date'])
df["months"] = df.finish_date.dt.to_period('M').astype(int) - df.start_date.dt.to_period('M').astype(int)
df["months_no_rounding"] = df.finish_date.dt.to_period('M') - df.start_date.dt.to_period('M')
print(df)
Result:
Upvotes: 2
Reputation: 260335
Be careful as your difference has issues with rounding! For instance, the difference between 2019-05-29 and 2019-06-05 is only 6 days, but you calculate it as 1 month. This might be wanted… or not!
Here is an alternative to approach a real difference in months (an absolute difference in months is not possible as months are variable, from 29 to 31):
df['diff'] = (df['finish_date']-df['start_date']).dt.days//30
output:
start_date finish_date diff
2842 2019-02-16 19:35:55.125766 2019-06-23 08:10:42.867492 4
2844 2019-05-29 18:03:54.230822 2019-06-05 08:06:37.896891 0
2846 2019-03-26 10:29:14.626280 2019-03-28 03:00:12.350836 0
2847 2019-04-22 16:29:30.480639 2019-04-24 18:02:09.869749 0
2852 2019-06-28 11:32:32.104132 2019-07-07 20:15:47.000026 0
2853 2019-03-21 17:20:50.030024 2019-03-27 03:18:26.652882 0
2854 2019-07-12 13:46:24.119986 2019-09-16 14:36:16.995393 2
and as float: df['diff'] = (df['finish_date']-df['start_date']).dt.days/30
start_date finish_date diff
2842 2019-02-16 19:35:55.125766 2019-06-23 08:10:42.867492 4.200000
2844 2019-05-29 18:03:54.230822 2019-06-05 08:06:37.896891 0.200000
2846 2019-03-26 10:29:14.626280 2019-03-28 03:00:12.350836 0.033333
2847 2019-04-22 16:29:30.480639 2019-04-24 18:02:09.869749 0.066667
2852 2019-06-28 11:32:32.104132 2019-07-07 20:15:47.000026 0.300000
2853 2019-03-21 17:20:50.030024 2019-03-27 03:18:26.652882 0.166667
2854 2019-07-12 13:46:24.119986 2019-09-16 14:36:16.995393 2.200000
Upvotes: 0