Chrestomanci
Chrestomanci

Reputation: 221

Finding end of a month in a column of pandas dataframe

To help categorize a large dataset by months, I am trying to convert all actual dates of an entry into the end of the month.

I saw similar questions to this and used the code I found, but it does not seem to work if the date happened to already be the end of the month (that would result in the next month's end being calculated instead).

df['CalcEnd'] = pd.to_datetime(df['ActualDate'], format="%m/%d/%Y") + MonthEnd(1)

That leads to:

    ActualDate    CalcEnd
    7/1/2019      7/31/2019
    7/2/2019      7/31/2019
    7/31/2019     8/31/2019

The third entry should be returning 7/31/2019.

I tried to use numpy to only use the CalcEnd if the date is not already the end of the month to avoid this issue, but for some reason CalcEnd resulted in a weird series of numbers if it wasn't already the end of month.

Specifically I tried:

def isMonthEnd(date):
    return date + pd.offsets.MonthEnd(0) == date

df['EndCheck'] = isMonthEnd(pd.to_datetime(df['ActualDate'], format="%m/%d/%Y"))
df['CalcEnd'] = pd.to_datetime(df['ActualDate'], format="%m/%d/%Y") + MonthEnd(1)
df['End'] = np.where(df['EndCheck']==False, df['CalcEnd'], df['ActualDate'])

When EndCheck is False, instead of showing 7/31/2019, it shows 1564531200000000000.

But when it is True, it correctly shows ActualDate as 7/31/2019.

Any advice on:

1) How to convert all given dates in a dataframe to the end of the month, even when the given date is already the end of the month; and

2) Why the np.where statement is not working when attempting to use the calculated column

.... would be greatly appreciated!

I was able to find a workaround by just writing the dataframe to a csv, and reading that new csv back into the dataframe before creating End; it seems to resolve the issue with the np.where statement returning 1564531200000000000. However, I'm hoping there is a more elegant solution.

Thank you!

Upvotes: 0

Views: 736

Answers (2)

Andy L.
Andy L.

Reputation: 25239

This behavior mentioned in docs:

When n is not 0, if the given date is not on an anchor point, it snapped to the next(previous) anchor point, and moved |n|-1 additional steps forwards or backwards. . If the given date is on an anchor point, it is moved |n| points forwards or backwards.

You don't need np.where. The fix is simple. If you go forward, just subtract one days before adding anchor. If you go backward, add one day before subtracting anchor

You go forward to MonthEnd, so just subtract one day before adding anchor

df['CalcEnd'] = df['ActualDate']  - pd.offsets.Day() + pd.offsets.MonthEnd(1)

Out[370]:
  ActualDate    CalcEnd
0 2019-07-01 2019-07-31
1 2019-07-02 2019-07-31
2 2019-07-31 2019-07-31

Upvotes: 2

pythonic833
pythonic833

Reputation: 3224

Just check whether a date is still in the same month if you add one day and make the offset depending on the result

def to_end_of_month(date):
    if (date + pd.offsets.Day(1)).month == date.month:
        return date + pd.offsets.MonthEnd(1)
    else:
        return date + pd.offsets.MonthEnd(0)

df['CalcEnd'] = df['ActualDate'].apply(to_end_of_month)

result

    ActualDate  CalcEnd
0   2019-07-01  2019-07-31
1   2019-07-02  2019-07-31
2   2019-07-31  2019-07-31

Upvotes: 2

Related Questions