Frank0013
Frank0013

Reputation: 61

How to get rid of MonthEnds type

I am trying to get the delta in months between a starting date and an ending date within Pandas DataFrame. The result is not totally satisfying...

First, the outcome is some sort of Datetime type in the form of <[value] * MonthEnds>. I can't use this to calculate with. First question is how to convert this to an integer. I tried the .n attribute but then I get the following error:

AttributeError: 'Series' object has no attribute 'n'  

Second, the outcome is 'missing' one month. Can this be avoided by using another solution/method? Or should I just add 1 month to the answer?

To support my questions I created some simplified code:

dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)

df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'].dt.to_period('M') - df['Start'].dt.to_period('M'))
df

This results in:

    Start       End         Duration
0   2020-01-01  2020-10-31  <9 * MonthEnds>
1   2020-02-01  2020-11-30  <9 * MonthEnds>

The preferred result would be:

    Start       End         Duration
0   2020-01-01  2020-10-31  10
1   2020-02-01  2020-11-30  10

Upvotes: 1

Views: 400

Answers (2)

Kuldip Chaudhari
Kuldip Chaudhari

Reputation: 1112

Try This

dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)

df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'] - df['Start']).apply(lambda x:x.days//30)
print(df)

Upvotes: 0

thorntonc
thorntonc

Reputation: 2126

Subtract the start-date from the end-date and convert the time delta to months.

import pandas as pd

dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End']-df['Start']).astype('<m8[M]').astype(int)+1
print(df)

Output:

       Start        End  Duration
0 2020-01-01 2020-10-31        10
1 2020-02-01 2020-11-30        10

Upvotes: 2

Related Questions