Larry Flores
Larry Flores

Reputation: 73

How to get season and if a date range falls on a US Holiday with pandas?

I have a pandas dataframe that has 2 columns named date_x and date_y. I want to pull what season date_x is in (winter, spring, summer, or fall) in a separate column and then determine if a holiday falls with the range of dates exclusive of date_y. I have seen some potential solutions around that have done seasons by breaking down the year into days but I can't seem to make that work for me. I am still figuring out what pandas can do. Another problem I am faced with is how to account for holidays like Easter that change every year. Any help is truly appreciated.

I have also played around with trying to make this solution work for me but can't figure out how to adapt it to create a new column that works off of date_x instead of today's date

Below is an image of what I am striving for.

Date_x, Date_y, what season Date_x is in, What holiday, if any, is within the range of dates, exclusive of Date_y

Upvotes: 2

Views: 1341

Answers (3)

David Erickson
David Erickson

Reputation: 16683

Let's say that you have a dataframe with dates that not only range multiple holidays, but they range multiple seasons:

        Date_x      Date_y
0   2020-12-22  2021-01-01
1   2020-06-20  2020-07-11
3   2020-02-11  2020-03-27
4   2020-05-22  2020-06-27

In order to get 1. Season and 2. Holiday:

  1. I built off the link you shared to customize seasons
  2. I tried to avoid "third-party" libraries and chose to use the USFederalHolidayCalendar from the pandas holiday library; because, I thought that would be more reliable; however, I do not have much experieince with holiday libraries. Also, there are multiple calendars that could be used from the pandas library. From there, I used the get_season(x) and get_holiday() function that I created. For the former, I would reference the link in your question, and the latter uses list comprehension to pull in holidays into your dataframe from the holidays dataframe I created.

from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import datetime
import pandas as pd

cal = USFederalHolidayCalendar()
holidays = (pd.DataFrame(cal.holidays(return_name=True), columns=['Holiday'])
            .reset_index()
            .rename({'index': 'Date'}, axis=1))
holidays['Date'] = pd.to_datetime(holidays['Date'])
df= pd.DataFrame({'Date_x': {0: '2020-12-22', 1: '2020-06-20', 2: '2020-02-11', 3: '2020-05-22'},
                  'Date_y': {0: '2021-01-01', 1: '2020-07-11', 2: '2020-03-27', 3: '2020-06-27'}})
df['Date_x'] = pd.to_datetime(df['Date_x'])
df['Date_y'] = pd.to_datetime(df['Date_y'])

Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('Winter', (date(Y,  1,  1),  date(Y,  3, 20))),
           ('Spring', (date(Y,  3, 21),  date(Y,  6, 20))),
           ('Summer', (date(Y,  6, 21),  date(Y,  9, 22))),
           ('Fall', (date(Y,  9, 23),  date(Y, 12, 20))),
           ('Winter', (date(Y, 12, 21),  date(Y, 12, 31)))]

def get_season(x):
    x = x.replace(year=Y)
    return next(season for season, (start, end) in seasons
                if start <= x <= end)


def get_holiday():
    return pd.DataFrame([(h,y,z) for (h,d) in zip(holidays['Holiday'], holidays['Date'])
     for (y, z) in zip(df['Date_x'], df['Date_y']) if y.date() <= d.date() if d.date() <= z.date()], columns=['Holiday', 'Date_x', 'Date_y'])


s1 = df['Date_x'].apply(lambda x: get_season(x))
s2 = df['Date_y'].apply(lambda x: get_season(x))
df['Season']= [', '.join(list(set([x,y]))) for (x,y) in zip(s1,s2)]
dft = get_holiday()
dft = dft.groupby(['Date_x', 'Date_y'])['Holiday'].apply(lambda x: ', '.join(list(x)))
df = pd.merge(df, dft, how='left', on=['Date_x', 'Date_y'])
df

Out[32]: 
      Date_x     Date_y            Season                     Holiday
0 2020-12-22 2021-01-01            Winter    Christmas, New Years Day
1 2020-06-20 2020-07-11    Summer, Spring                    July 4th
2 2020-02-11 2020-03-27    Spring, Winter              Presidents Day
3 2020-05-22 2020-06-27    Summer, Spring                Memorial Day

Upvotes: 1

John Zwinck
John Zwinck

Reputation: 249123

Convert your columns to datetime type (if they aren't already) and construct two new empty columns:

df.Date_x = pd.to_datetime(df.Date_x)
df.Date_y = pd.to_datetime(df.Date_y)

Adding the Season column is easy once you notice the quarter attribute of datetime Series:

df['Season'] = df.Date_x.dt.quarter.map({1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'})

Holidays are obviously a bit more involved. Start with an empty column:

df['Holiday'] = ''

Easter is a special one, let's take care of that first. Pandas has it built in, though you can define your own Holiday rules too.

easter = df.Date_y - pd.tseries.offsets.Easter()

That gives you the Easter before Date_y (I could have used Date_x + Easter() but you said you wanted it exclusive of Date_y). Then:

df.loc[df.Date_x < easter, 'Holiday'] = 'Easter'

That puts "Easter" in the rows where the holiday is within [Date_x, Date_y).

I leave it as an exercise for you to create the holiday rules for the rest of your holidays. Here's some advice on that: Pandas Time Series Holiday Rule Offset

Upvotes: 1

kerasbaz
kerasbaz

Reputation: 1794

I believe you're looking for something like below. Notice that part of this solution uses the answer provided here.

You may have to adjust to account for your exclusive requirement, but this will give you the idea.

import pandas as pd
import numpy as np

import holidays # pip install holidays

from datetime import date, datetime

us_holidays = holidays.UnitedStates()

Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('winter', (date(Y,  1,  1),  date(Y,  3, 20))),
           ('spring', (date(Y,  3, 21),  date(Y,  6, 20))),
           ('summer', (date(Y,  6, 21),  date(Y,  9, 22))),
           ('autumn', (date(Y,  9, 23),  date(Y, 12, 20))),
           ('winter', (date(Y, 12, 21),  date(Y, 12, 31)))]

def get_season(dt):
    if isinstance(dt, datetime):
        dt = dt.date()
    dt = dt.replace(year=Y)
    return next(season for season, (start, end) in seasons
                if start <= dt <= end)

def get_holiday(data):
    return ",".join([us_holidays.get(x) for x in us_holidays[data["Date_X"] : data["Date_Y"]]])

np.random.seed(0)
rng_x = pd.date_range('2020-12-22', periods=365, freq='D')
rng_y = pd.date_range('2020-12-26', periods=365, freq='D')

df = pd.DataFrame({ 'Date_X': rng_x, 'Season_X': "", 'Date_Y': rng_y, 'Season_Y': ""}) 
print(df.head())

df['Season_X'] = df["Date_X"].apply(get_season)
df['Season_Y'] = df["Date_Y"].apply(get_season)
print(df.head())

df['Holiday'] = df.apply(get_holiday, axis=1)
print(df.head())

Output:

      Date_X Season_X     Date_Y Season_Y        Holiday
0 2020-12-22   winter 2020-12-26   winter  Christmas Day
1 2020-12-23   winter 2020-12-27   winter  Christmas Day
2 2020-12-24   winter 2020-12-28   winter  Christmas Day
3 2020-12-25   winter 2020-12-29   winter  Christmas Day
4 2020-12-26   winter 2020-12-30   winter

Upvotes: 2

Related Questions