Reputation: 31
I have given two dataframes
df1:
DATE INTERVALL
0 2017-01-01 00:30:00.000
1 2017-01-01 01:00:00.000
2 2017-01-01 01:30:00.000
3 2017-01-01 02:00:00.000
4 2017-01-01 02:30:00.000
....
200 2017-05-09 00:30:00.000
201 2017-05-09 01:00:00.000
202 2017-05-09 01:30:00.000
203 2017-05-09 02:00:00.000
....
df2:
Name Date
Neujahr 2017-01-01
Karfreitag 2017-04-14
Ostersonntag 2017-04-16
Ostermontag 2017-04-17
1. Mai 2017-05-01
Christi Himmelfahrt 2017-05-25
Pfingstsonntag 2017-06-04
i want to have in df1 a new Column with 'HOLIDAY'. "HOLIDAY" should have the value 1 if the 'DATE' Column is included in df2.
Example:
DATE INTERVALL HOLIDAY
0 2017-01-01 00:30:00.000 1
1 2017-01-01 01:00:00.000 1
2 2017-01-01 01:30:00.000 1
3 2017-01-01 02:00:00.000 1
4 2017-01-01 02:30:00.000 1
....
200 2017-05-09 00:30:00.000 0
201 2017-05-09 01:00:00.000 0
202 2017-05-09 01:30:00.000 0
203 2017-05-09 02:00:00.000 0
...
i tried with a if statement, but thats to slow. I think there is a better solutions:
if row['DATE'] == "2017-01-01":
df1.set_value(index, 'HOLIDAY', 1)
Upvotes: 1
Views: 1269
Reputation: 863651
Use isin
with converting boolean mask
to True
to 1
and False
to 0
by astype
:
#convert to datetimes if necessary
df1['DATE'] = pd.to_datetime(df1['DATE'])
df2['Date'] = pd.to_datetime(df2['Date'])
df1['HOLIDAY'] = df1['DATE'].isin(df2['Date']).astype(int)
print (df1)
DATE INTERVALL HOLIDAY
0 2017-01-01 00:30:00.000 1
1 2017-01-01 01:00:00.000 1
2 2017-01-01 01:30:00.000 1
3 2017-01-01 02:00:00.000 1
4 2017-01-01 02:30:00.000 1
200 2017-05-09 00:30:00.000 0
201 2017-05-09 01:00:00.000 0
202 2017-05-09 01:30:00.000 0
203 2017-05-09 02:00:00.000 0
Detail:
print (df1['DATE'].isin(df2['Date']))
0 True
1 True
2 True
3 True
4 True
200 False
201 False
202 False
203 False
Name: DATE, dtype: bool
print (df1['DATE'].dtype)
datetime64[ns]
print (df2['Date'].dtype)
datetime64[ns]
Upvotes: 3