Batmax
Batmax

Reputation: 253

Create dummy values for a list of dates in pandas

I have a dataframe with a double index (day, time) and would like to create a new column 'Holiday' equal to one if the index day belongs to a list of holidays.

My list of holidays of type DatetimeIndex:

holidays = ['2017-09-11', '2017-12-24']

My original dataframe:

                       Visitor  
Date       Time                                                              
2017-09-11 4:45           0         
           5:00           1        
           5:15          26       
....
2017-09-12 4:45           0       
           5:00           1         
           5:15          26     
....

What I would like to have:

                       Visitor      Holiday  
Date       Time                                                              
2017-09-11 4:45           0           1         
           5:00           1           1         
           5:15          26           1         
....
2017-09-12 4:45           0           0         
           5:00           1           0         
           5:15          26           0        
....

Here is what I tried based on this previous answer:

df['Holiday'] = int(df.index.get_level_values(0) in holidays == True)

However my column 'Holiday' always has the value 0...

Thanks in advance!

Upvotes: 0

Views: 3925

Answers (2)

cs95
cs95

Reputation: 402982

Your current solution should actually throw a ValueError:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Note that the in operator is meant for scalar values and is not meant to be used with pandas dataframes. pandas instead has a lot of conditional and bitwise operators that are overloaded, as well as a comprehensive suite of APIs for the purpose of conditional and boolean logic.


You can use np.where and isin.

df['Holiday'] = np.where(df.index.get_level_values(0).isin(holidays), 1, 0)
df


                 Visitor  Holiday
Date       Time                  
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0

Upvotes: 2

Zero
Zero

Reputation: 77017

Use isin by taking the date level from get_level_values and use astype(int) to convert boolean to integer.

In [192]: df['Holiday'] = df.index.get_level_values(0).isin(holidays).astype(int)

In [193]: df
Out[193]:
                 Visitor  Holiday
Date       Time
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0

If you want a copy instead of modifying df

In [196]: df.assign(Holiday=df.index.get_level_values(0).isin(holidays).astype(int))
Out[196]:
                 Visitor  Holiday
Date       Time
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0

Upvotes: 2

Related Questions