Verbamore
Verbamore

Reputation: 189

How to find the median month between two dates?

I need to find the median month value between two dates in a date frame. I am simplifying the case by showing four examples.

import pandas as pd
import numpy as np
import datetime
df=pd.DataFrame([["1/31/2016","3/1/2016"],
                 ["6/15/2016","7/14/2016"],
                 ["7/14/2016","8/15/2016"], 
                 ["8/7/2016","9/6/2016"]], columns=['FromDate','ToDate'])

df['Month'] = df.ToDate.dt.month-df.FromDate.dt.month

I am trying to append a column but I am not getting the desired result. I need to see these values: [2,6,7,8].

Upvotes: 1

Views: 330

Answers (2)

Peter Sorensen
Peter Sorensen

Reputation: 73

You need to convert the string to datetime before using dt.month. This line calculates the average month number :

df['Month']   = (pd.to_datetime(df['ToDate']).dt.month +
                 pd.to_datetime(df['FromDate']).dt.month)//2
print(df)
    FromDate     ToDate  Month
0  1/31/2016   3/1/2016      2
1  6/15/2016  7/14/2016      6
2  7/14/2016  8/15/2016      7
3   8/7/2016   9/6/2016      8

This only works with both dates in the same year.

jpp's solution is fine but will in some cases give the wrong answer:

["1/1/2016","3/1/2016"] one would expect 2 because February is between January and March, but jpp's will give 1 corresponding to January.

Upvotes: 0

jpp
jpp

Reputation: 164693

You can calculate the average date explicitly by adding half the timedelta between 2 dates to the earlier date. Then just extract the month:

# convert to datetime if necessary
df[df.columns] = df[df.columns].apply(pd.to_datetime)

# calculate mean date, then extract month
df['Month'] = (df['FromDate'] + (df['ToDate'] - df['FromDate']) / 2).dt.month

print(df)

    FromDate     ToDate  Month
0 2016-01-31 2016-03-01      2
1 2016-06-15 2016-07-14      6
2 2016-07-14 2016-08-15      7
3 2016-08-07 2016-09-06      8

Upvotes: 3

Related Questions