Henrij
Henrij

Reputation: 31

Check if date is in another dataframe and set a new column

I have given two dataframes

df1:

    DATE     INTERVALL    
0  2017-01-01  00:30:00.000      
1  2017-01-01  01:00:00.000  
2  2017-01-01  01:30:00.000   
3  2017-01-01  02:00:00.000      
4  2017-01-01  02:30:00.000
....
200 2017-05-09 00:30:00.000
201 2017-05-09 01:00:00.000  
202 2017-05-09 01:30:00.000  
203 2017-05-09 02:00:00.000
....        

df2:

Name                         Date                       
Neujahr                     2017-01-01
Karfreitag                  2017-04-14
Ostersonntag                2017-04-16
Ostermontag                 2017-04-17
1. Mai                      2017-05-01
Christi Himmelfahrt         2017-05-25
Pfingstsonntag              2017-06-04

i want to have in df1 a new Column with 'HOLIDAY'. "HOLIDAY" should have the value 1 if the 'DATE' Column is included in df2.

Example:

    DATE     INTERVALL        HOLIDAY    
0  2017-01-01  00:30:00.000     1      
1  2017-01-01  01:00:00.000     1
2  2017-01-01  01:30:00.000     1
3  2017-01-01  02:00:00.000     1 
4  2017-01-01  02:30:00.000     1
....
200 2017-05-09 00:30:00.000     0 
201 2017-05-09 01:00:00.000     0
202 2017-05-09 01:30:00.000     0
203 2017-05-09 02:00:00.000     0 
... 

i tried with a if statement, but thats to slow. I think there is a better solutions:

    if row['DATE'] == "2017-01-01":
        df1.set_value(index, 'HOLIDAY', 1)

Upvotes: 1

Views: 1269

Answers (1)

jezrael
jezrael

Reputation: 863651

Use isin with converting boolean mask to True to 1 and False to 0 by astype:

#convert to datetimes if necessary
df1['DATE'] = pd.to_datetime(df1['DATE'])
df2['Date'] = pd.to_datetime(df2['Date'])

df1['HOLIDAY'] = df1['DATE'].isin(df2['Date']).astype(int)
print (df1)
          DATE     INTERVALL  HOLIDAY
0   2017-01-01  00:30:00.000        1
1   2017-01-01  01:00:00.000        1
2   2017-01-01  01:30:00.000        1
3   2017-01-01  02:00:00.000        1
4   2017-01-01  02:30:00.000        1
200 2017-05-09  00:30:00.000        0
201 2017-05-09  01:00:00.000        0
202 2017-05-09  01:30:00.000        0
203 2017-05-09  02:00:00.000        0

Detail:

print (df1['DATE'].isin(df2['Date']))
0       True
1       True
2       True
3       True
4       True
200    False
201    False
202    False
203    False
Name: DATE, dtype: bool

print (df1['DATE'].dtype)
datetime64[ns]

print (df2['Date'].dtype)
datetime64[ns]

Upvotes: 3

Related Questions