Reputation: 605
Hello world,
I would like to retrieve for each month the number of public holiday.
Here is my dataset
City date value End_date
BE 01/01/16 41 31/01/16
NW 01/10/16 74 31/10/16
BY 01/05/16 97 31/05/16
With the following Code, I am able to know if the day is a public holiday manually:
from datetime import date
import holidays
#prov = BW, BY, BE, BB, HB, HH, HE, MV, NI, NW, RP, SL, SN, ST, SH, TH
us_holidays = holidays.CountryHoliday('DE', prov='NW', state=None )
date(2020, 5, 21) in us_holidays
out:
False
The Questions: How can I count for each month Number of 'True' values? How can I store the count of 'True' values within the dataframe?
Expected output
City date value End_date Nb_pub_holiday
BE 01/01/16 41 31/01/16 2
NW 01/10/16 74 31/10/16 0
BY 01/05/16 97 31/05/16 4
Upvotes: 3
Views: 1248
Reputation: 862511
Not sure why, but I get different output in custom function with date_range
and count matched values by sum
in generator:
#convert columns to datetimes
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%y')
df['End_date'] = pd.to_datetime(df['End_date'], format='%d/%m/%y')
import holidays
def f1(x):
h = holidays.CountryHoliday('DE', prov=x['City'], state=None)
d = pd.date_range(x['date'], x['End_date'])
return sum(y in h for y in d)
df['Nb_pub_holiday'] = df.apply(f1, axis=1)
print (df)
City date value End_date Nb_pub_holiday
0 BE 2016-01-01 41 2016-01-31 1
1 NW 2016-10-01 74 2016-10-31 1
2 BY 2016-05-01 97 2016-05-31 4
For list of dates of holidays is possible use:
def f2(x):
h = holidays.CountryHoliday('DE', prov=x['City'], state=None)
d = pd.date_range(x['date'], x['End_date'])
return [y.date() for y in d if y in h]
df['Lst_pub_holiday'] = df.apply(f2, axis=1)
print (df)
City date value End_date \
0 BE 2016-01-01 41 2016-01-31
1 NW 2016-10-01 74 2016-10-31
2 BY 2016-05-01 97 2016-05-31
Lst_pub_holiday
0 [2016-01-01]
1 [2016-10-03]
2 [2016-05-01, 2016-05-05, 2016-05-16, 2016-05-26]
Upvotes: 5