Reputation: 189
I need to find the median month value between two dates in a date frame. I am simplifying the case by showing four examples.
import pandas as pd
import numpy as np
import datetime
df=pd.DataFrame([["1/31/2016","3/1/2016"],
["6/15/2016","7/14/2016"],
["7/14/2016","8/15/2016"],
["8/7/2016","9/6/2016"]], columns=['FromDate','ToDate'])
df['Month'] = df.ToDate.dt.month-df.FromDate.dt.month
I am trying to append a column but I am not getting the desired result.
I need to see these values: [2,6,7,8]
.
Upvotes: 1
Views: 330
Reputation: 73
You need to convert the string to datetime before using dt.month. This line calculates the average month number :
df['Month'] = (pd.to_datetime(df['ToDate']).dt.month +
pd.to_datetime(df['FromDate']).dt.month)//2
print(df)
FromDate ToDate Month
0 1/31/2016 3/1/2016 2
1 6/15/2016 7/14/2016 6
2 7/14/2016 8/15/2016 7
3 8/7/2016 9/6/2016 8
This only works with both dates in the same year.
jpp's solution is fine but will in some cases give the wrong answer:
["1/1/2016","3/1/2016"] one would expect 2 because February is between January and March, but jpp's will give 1 corresponding to January.
Upvotes: 0
Reputation: 164693
You can calculate the average date explicitly by adding half the timedelta
between 2 dates to the earlier date. Then just extract the month:
# convert to datetime if necessary
df[df.columns] = df[df.columns].apply(pd.to_datetime)
# calculate mean date, then extract month
df['Month'] = (df['FromDate'] + (df['ToDate'] - df['FromDate']) / 2).dt.month
print(df)
FromDate ToDate Month
0 2016-01-31 2016-03-01 2
1 2016-06-15 2016-07-14 6
2 2016-07-14 2016-08-15 7
3 2016-08-07 2016-09-06 8
Upvotes: 3