Reputation: 73
I have a pandas dataframe that has 2 columns named date_x and date_y. I want to pull what season date_x is in (winter, spring, summer, or fall) in a separate column and then determine if a holiday falls with the range of dates exclusive of date_y. I have seen some potential solutions around that have done seasons by breaking down the year into days but I can't seem to make that work for me. I am still figuring out what pandas can do. Another problem I am faced with is how to account for holidays like Easter that change every year. Any help is truly appreciated.
I have also played around with trying to make this solution work for me but can't figure out how to adapt it to create a new column that works off of date_x instead of today's date
Below is an image of what I am striving for.
Upvotes: 2
Views: 1341
Reputation: 16683
Let's say that you have a dataframe with dates that not only range multiple holidays, but they range multiple seasons:
Date_x Date_y
0 2020-12-22 2021-01-01
1 2020-06-20 2020-07-11
3 2020-02-11 2020-03-27
4 2020-05-22 2020-06-27
In order to get 1. Season
and 2. Holiday
:
USFederalHolidayCalendar
from the pandas
holiday
library; because, I thought that would be more reliable; however, I do not have much experieince with holiday libraries. Also, there are multiple calendars
that could be used from the pandas library. From there, I used the get_season(x)
and get_holiday()
function that I created. For the former, I would reference the link in your question, and the latter uses list comprehension to pull in holidays into your dataframe from the holidays
dataframe I created.from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import datetime
import pandas as pd
cal = USFederalHolidayCalendar()
holidays = (pd.DataFrame(cal.holidays(return_name=True), columns=['Holiday'])
.reset_index()
.rename({'index': 'Date'}, axis=1))
holidays['Date'] = pd.to_datetime(holidays['Date'])
df= pd.DataFrame({'Date_x': {0: '2020-12-22', 1: '2020-06-20', 2: '2020-02-11', 3: '2020-05-22'},
'Date_y': {0: '2021-01-01', 1: '2020-07-11', 2: '2020-03-27', 3: '2020-06-27'}})
df['Date_x'] = pd.to_datetime(df['Date_x'])
df['Date_y'] = pd.to_datetime(df['Date_y'])
Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('Winter', (date(Y, 1, 1), date(Y, 3, 20))),
('Spring', (date(Y, 3, 21), date(Y, 6, 20))),
('Summer', (date(Y, 6, 21), date(Y, 9, 22))),
('Fall', (date(Y, 9, 23), date(Y, 12, 20))),
('Winter', (date(Y, 12, 21), date(Y, 12, 31)))]
def get_season(x):
x = x.replace(year=Y)
return next(season for season, (start, end) in seasons
if start <= x <= end)
def get_holiday():
return pd.DataFrame([(h,y,z) for (h,d) in zip(holidays['Holiday'], holidays['Date'])
for (y, z) in zip(df['Date_x'], df['Date_y']) if y.date() <= d.date() if d.date() <= z.date()], columns=['Holiday', 'Date_x', 'Date_y'])
s1 = df['Date_x'].apply(lambda x: get_season(x))
s2 = df['Date_y'].apply(lambda x: get_season(x))
df['Season']= [', '.join(list(set([x,y]))) for (x,y) in zip(s1,s2)]
dft = get_holiday()
dft = dft.groupby(['Date_x', 'Date_y'])['Holiday'].apply(lambda x: ', '.join(list(x)))
df = pd.merge(df, dft, how='left', on=['Date_x', 'Date_y'])
df
Out[32]:
Date_x Date_y Season Holiday
0 2020-12-22 2021-01-01 Winter Christmas, New Years Day
1 2020-06-20 2020-07-11 Summer, Spring July 4th
2 2020-02-11 2020-03-27 Spring, Winter Presidents Day
3 2020-05-22 2020-06-27 Summer, Spring Memorial Day
Upvotes: 1
Reputation: 249123
Convert your columns to datetime type (if they aren't already) and construct two new empty columns:
df.Date_x = pd.to_datetime(df.Date_x)
df.Date_y = pd.to_datetime(df.Date_y)
Adding the Season column is easy once you notice the quarter
attribute of datetime Series:
df['Season'] = df.Date_x.dt.quarter.map({1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'})
Holidays are obviously a bit more involved. Start with an empty column:
df['Holiday'] = ''
Easter is a special one, let's take care of that first. Pandas has it built in, though you can define your own Holiday rules too.
easter = df.Date_y - pd.tseries.offsets.Easter()
That gives you the Easter before Date_y
(I could have used Date_x + Easter()
but you said you wanted it exclusive of Date_y
). Then:
df.loc[df.Date_x < easter, 'Holiday'] = 'Easter'
That puts "Easter" in the rows where the holiday is within [Date_x, Date_y)
.
I leave it as an exercise for you to create the holiday rules for the rest of your holidays. Here's some advice on that: Pandas Time Series Holiday Rule Offset
Upvotes: 1
Reputation: 1794
I believe you're looking for something like below. Notice that part of this solution uses the answer provided here.
You may have to adjust to account for your exclusive requirement, but this will give you the idea.
import pandas as pd
import numpy as np
import holidays # pip install holidays
from datetime import date, datetime
us_holidays = holidays.UnitedStates()
Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('winter', (date(Y, 1, 1), date(Y, 3, 20))),
('spring', (date(Y, 3, 21), date(Y, 6, 20))),
('summer', (date(Y, 6, 21), date(Y, 9, 22))),
('autumn', (date(Y, 9, 23), date(Y, 12, 20))),
('winter', (date(Y, 12, 21), date(Y, 12, 31)))]
def get_season(dt):
if isinstance(dt, datetime):
dt = dt.date()
dt = dt.replace(year=Y)
return next(season for season, (start, end) in seasons
if start <= dt <= end)
def get_holiday(data):
return ",".join([us_holidays.get(x) for x in us_holidays[data["Date_X"] : data["Date_Y"]]])
np.random.seed(0)
rng_x = pd.date_range('2020-12-22', periods=365, freq='D')
rng_y = pd.date_range('2020-12-26', periods=365, freq='D')
df = pd.DataFrame({ 'Date_X': rng_x, 'Season_X': "", 'Date_Y': rng_y, 'Season_Y': ""})
print(df.head())
df['Season_X'] = df["Date_X"].apply(get_season)
df['Season_Y'] = df["Date_Y"].apply(get_season)
print(df.head())
df['Holiday'] = df.apply(get_holiday, axis=1)
print(df.head())
Output:
Date_X Season_X Date_Y Season_Y Holiday
0 2020-12-22 winter 2020-12-26 winter Christmas Day
1 2020-12-23 winter 2020-12-27 winter Christmas Day
2 2020-12-24 winter 2020-12-28 winter Christmas Day
3 2020-12-25 winter 2020-12-29 winter Christmas Day
4 2020-12-26 winter 2020-12-30 winter
Upvotes: 2