Holiday ranking logic - translating from English to Python

Question

I have been trying to wrap my head around a, in theory, simple task, but am having real difficulty coding it up. It is a kind of code test / brain teaser!

On page 14 of this document there is a holiday code ruleset that I am trying to translate from English to Python. I essentially want a column that states which rank (1-16, I am ignoring beyond that) that the particular date corresponds to: https://www.gasgovernance.co.uk/sites/default/files/ggf/2020-02/3.0%20Modelling%20Approach%202020_Final.pdf

If one was to have a dataset of many years length, for example from 2012 to 2022, how would I code up in Python this rule set? I don't want to do it manually as I might want to do the same thing for another country of something, so want to have the actual logic.

Below is my attempt so far. I start off with getting the holiday calendar for Great Britain. I then create a 'rank' column as a placeholder. Then I am trying to reference off these holidays, but the logic is quite difficult (for me).

Wondering if there is a better way or how exactly to do this?

import holidays
import numpy as np
import pandas as pd

dates = pd.date_range(start='2012-01-01', end='2022-12-31', freq='D')
df = pd.DataFrame({'Value': np.random.rand(len(dates))}, index=dates)

def get_british_holidays(df):
    gb_holidays = holidays.UnitedKingdom()
    holiday_dates = pd.Series(index=df.index)
    for single_date in holiday_dates.index:
        if single_date in gb_holidays:
            holiday = gb_holidays[single_date]
            holiday_parts = [part.strip() for part in holiday.split(',')]
            holiday_parts = [part for part in holiday_parts if '[Northern Ireland]' not in part]
            holiday_dates.loc[single_date] = ', '.join(holiday_parts)
    holiday_dates = holiday_dates.replace('', np.nan).fillna(value=np.nan)
    holiday_dates = holiday_dates.to_frame(name='holiday')
    return pd.merge(df, holiday_dates, left_index=True, right_index=True, how='left')


df = get_british_holidays(df)
df['rank'] = 0

# first attempt
for date, rank in df.loc[df.index.month.isin([12, 1])].iterrows():
    if df.loc[date, 'holiday'] == 'Christmas Day':
        if date.dayofweek <= 3:
            holiday_period_start = pd.Timestamp(date.year, date.month, date.day - 3 - date.dayofweek)
        else:
            holiday_period_start = pd.Timestamp(date.year, date.month, date.day - date.dayofweek)

    if df.loc[date, 'holiday'] == 'New Year Holiday [Scotland]':
        holiday_period_end = pd.Timestamp(date.year, date.month, date.day + date.dayofweek)

# second attempt
for date, rank in df.loc[df.index.month.isin([12, 1])].iterrows():

    if date in pd.date_range(pd.Timestamp(date.year, 12, 24), pd.Timestamp(date.year + 1, 1, 2)):
        df.loc[date, 'rank'] = 5
    elif date == pd.Timestamp(date.year, 12, 25):
        df.loc[date, 'rank'] = 1
    elif date in pd.date_range(pd.Timestamp(date.year, 12, 26), pd.Timestamp(date.year, 12, 27)) or \
            date in pd.date_range(pd.Timestamp(date.year, 1, 1), pd.Timestamp(date.year, 1, 2)):
        df.loc[date, 'rank'] = 2
    elif date.dayofweek < 5 and date in pd.date_range(pd.Timestamp(date.year, 12, 24),
                                                      pd.Timestamp(date.year, 1, 1)):
        if df.loc[date, 'rank'] == 0:
            df.loc[date, 'rank'] = 3

Holiday ranking logic - translating from English to Python

Answers (1)

Related Questions