Mainland
Mainland

Reputation: 4604

Python Identify US holidays in a timeseries dataframe

I have a dataframe with years of data. I want to detect, assign True/False or 1/0 if it is a Holiday.

My code:

df = pd.DataFrame(index=['2004-10-01', '2004-10-02', '2004-10-03', '2004-10-04', '2004-10-05',
       '2004-10-06', '2004-10-07', '2004-10-08', '2004-10-09', '2004-10-10',
       '2018-07-25', '2018-07-26', '2018-07-27', '2018-07-28', '2018-07-29',
       '2018-07-30', '2018-07-31', '2018-08-01', '2018-08-02', '2018-08-03'])

import holidays
# Detect US holidays
hldys = holidays.country_holidays('US')
# I have to apply this to each date in the dataframe index
df['isHoliday?'] = df.index.map(lambda x: hldy.get(x))

Present output:

            isHoliday?
2004-10-01  None
2004-10-02  None
2004-10-03  None
2004-10-04  None
2004-10-05  None
2004-10-06  None
2004-10-07  None
2004-10-08  None
2004-10-09  None
2004-10-10  None
2018-07-25  None
2018-07-26  None
2018-07-27  None
2018-07-28  None
2018-07-29  None
2018-07-30  None
2018-07-31  None
2018-08-01  None
2018-08-02  None

Update I found the solution

us_hldys = holidays.country_holidays('US')
df['isHoliday?'] = df.index.to_series().apply(lambda x: x in us_hldys)

isHoliday?
2004-10-01  False
2004-10-02  False
2004-10-03  False
2004-10-04  False
2004-10-05  False
2004-10-06  False
2004-10-07  False
2004-10-08  False
2004-10-09  False
2004-10-10  False
2018-07-25  False
2018-07-26  False
2018-07-27  False
2018-07-28  False
2018-07-29  False
2018-07-30  False
2018-07-31  False
2018-08-01  False
2018-08-02  False
2018-08-03  False

Upvotes: 2

Views: 1275

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 31011

I created the source DataFrame slightly other way and took date range containing some US holidays:

df = pd.DataFrame({'date': [
    '2004-10-03', '2004-10-04', '2004-10-05', '2004-10-06', '2004-10-07',
    '2004-10-08', '2004-10-09', '2004-10-10', '2004-10-11', '2004-10-12',
    '2018-07-03', '2018-07-04', '2018-07-05', '2018-07-27', '2018-07-28',
    '2018-07-30', '2018-07-31', '2018-09-02', '2018-09-03', '2018-09-04']})
df.date = pd.to_datetime(df.date)  # Convert to DateTime

You don't need to use any external packages, as all you need is in Pandas:

from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()    # US calendar
hol = cal.holidays(start=df.date.min(), end=df.date.max())    # US holidays

To create an additional column stating whether each date is a US holiday, run:

df['isHoliday'] = df.date.isin(hol)

The result is:

         date  isHoliday
0  2004-10-03      False
1  2004-10-04      False
2  2004-10-05      False
3  2004-10-06      False
4  2004-10-07      False
5  2004-10-08      False
6  2004-10-09      False
7  2004-10-10      False
8  2004-10-11       True
9  2004-10-12      False
10 2018-07-03      False
11 2018-07-04       True
12 2018-07-05      False
13 2018-07-27      False
14 2018-07-28      False
15 2018-07-30      False
16 2018-07-31      False
17 2018-09-02      False
18 2018-09-03       True
19 2018-09-04      False

As you can see, as holidays are marked:

  • 2004-10-11 - Columbus Day
  • 2018-07-04 - Independence Day
  • 2018-09-03 - Labor Day

Upvotes: 1

Corralien
Corralien

Reputation: 120559

A different strategy is to pre-calculate holidays to avoid apply:

df.index = pd.to_datetime(df.index)
hol = pd.Series(holidays.country_holidays('US',  years=range(df.index.min().year,
                                                             df.index.max().year+1)))
df['isHoliday?'] = df.index.isin(hol)

Output:

>>> df
            isHoliday?
2004-10-01       False
2004-10-02       False
2004-10-03       False
2004-10-04       False
2004-10-05       False
2004-10-06       False
2004-10-07       False
2004-10-08       False
2004-10-09       False
2004-10-10       False
2018-07-25       False
2018-07-26       False
2018-07-27       False
2018-07-28       False
2018-07-29       False
2018-07-30       False
2018-07-31       False
2018-08-01       False
2018-08-02       False
2018-08-03       False

>>> hol.sort_index()
2004-01-01                New Year's Day
2004-01-19    Martin Luther King Jr. Day
2004-02-16         Washington's Birthday
2004-05-31                  Memorial Day
2004-07-04              Independence Day
                         ...            
2018-10-08                  Columbus Day
2018-11-11                  Veterans Day
2018-11-12       Veterans Day (Observed)
2018-11-22                  Thanksgiving
2018-12-25                 Christmas Day
Length: 169, dtype: object

Upvotes: 3

Related Questions