13sen1
13sen1

Reputation: 285

Holiday ranking logic - translating from English to Python

I have been trying to wrap my head around a, in theory, simple task, but am having real difficulty coding it up. It is a kind of code test / brain teaser!

On page 14 of this document there is a holiday code ruleset that I am trying to translate from English to Python. I essentially want a column that states which rank (1-16, I am ignoring beyond that) that the particular date corresponds to: https://www.gasgovernance.co.uk/sites/default/files/ggf/2020-02/3.0%20Modelling%20Approach%202020_Final.pdf

If one was to have a dataset of many years length, for example from 2012 to 2022, how would I code up in Python this rule set? I don't want to do it manually as I might want to do the same thing for another country of something, so want to have the actual logic.

Below is my attempt so far. I start off with getting the holiday calendar for Great Britain. I then create a 'rank' column as a placeholder. Then I am trying to reference off these holidays, but the logic is quite difficult (for me).

Wondering if there is a better way or how exactly to do this?

import holidays
import numpy as np
import pandas as pd

dates = pd.date_range(start='2012-01-01', end='2022-12-31', freq='D')
df = pd.DataFrame({'Value': np.random.rand(len(dates))}, index=dates)

def get_british_holidays(df):
    gb_holidays = holidays.UnitedKingdom()
    holiday_dates = pd.Series(index=df.index)
    for single_date in holiday_dates.index:
        if single_date in gb_holidays:
            holiday = gb_holidays[single_date]
            holiday_parts = [part.strip() for part in holiday.split(',')]
            holiday_parts = [part for part in holiday_parts if '[Northern Ireland]' not in part]
            holiday_dates.loc[single_date] = ', '.join(holiday_parts)
    holiday_dates = holiday_dates.replace('', np.nan).fillna(value=np.nan)
    holiday_dates = holiday_dates.to_frame(name='holiday')
    return pd.merge(df, holiday_dates, left_index=True, right_index=True, how='left')


df = get_british_holidays(df)
df['rank'] = 0

# first attempt
for date, rank in df.loc[df.index.month.isin([12, 1])].iterrows():
    if df.loc[date, 'holiday'] == 'Christmas Day':
        if date.dayofweek <= 3:
            holiday_period_start = pd.Timestamp(date.year, date.month, date.day - 3 - date.dayofweek)
        else:
            holiday_period_start = pd.Timestamp(date.year, date.month, date.day - date.dayofweek)

    if df.loc[date, 'holiday'] == 'New Year Holiday [Scotland]':
        holiday_period_end = pd.Timestamp(date.year, date.month, date.day + date.dayofweek)

# second attempt
for date, rank in df.loc[df.index.month.isin([12, 1])].iterrows():

    if date in pd.date_range(pd.Timestamp(date.year, 12, 24), pd.Timestamp(date.year + 1, 1, 2)):
        df.loc[date, 'rank'] = 5
    elif date == pd.Timestamp(date.year, 12, 25):
        df.loc[date, 'rank'] = 1
    elif date in pd.date_range(pd.Timestamp(date.year, 12, 26), pd.Timestamp(date.year, 12, 27)) or \
            date in pd.date_range(pd.Timestamp(date.year, 1, 1), pd.Timestamp(date.year, 1, 2)):
        df.loc[date, 'rank'] = 2
    elif date.dayofweek < 5 and date in pd.date_range(pd.Timestamp(date.year, 12, 24),
                                                      pd.Timestamp(date.year, 1, 1)):
        if df.loc[date, 'rank'] == 0:
            df.loc[date, 'rank'] = 3

Upvotes: 1

Views: 234

Answers (1)

RAEC
RAEC

Reputation: 332

I would break it down first by the obvious, the groups: "Christmas/New Year", Easter, "First Bank Holiday in May", etc. Taking care of one group at a time, try to find what the group main condition is. For example, let's work Christmas/New Year:

  • It all starts with the 25th. (CODE 1)
  • Apply (CODE 5) to all days from 25th to Scotland New Year Bank holiday), you can overwrite them as needed later.
  • The next big condition is days before the 25th (CODE 4):
    • If the 25 is Mon,Tue,Wed, iterate backwards from the 24th and mark all days with code 4 until you reach the Friday before.
    • Otherwise, still iterate backward but stop at Monday before the 25th.
  • Next best one is probably CODE 3, overwriting any code 5's as necessary.
  • Finally, apply code 2, overwriting whatever may be there (3 or 5).

I would apply similar approach to the other groups. Each point i make here, i would turn into a small function to keep things organized and easy to read.

I am not clear on your intent of use for this. If it were me, i would store all days that are holidays and their codes in a dictionary first. Then, when done, convert it to whatever you are trying to actually use. Basically skip panda until the end.

Upvotes: 0

Related Questions