Reputation: 69
I have two dataframes, denoted by df1 and df2. The df1 has 6 columns and df2 has 4 columns. The df1 has a column date that the smallest unit is second, but in the df2 is the hour. I am going to filter the df1 according to the df2. It means, I need to extract all records in a df1 that has the same hour as the df2.
Sample of data for more clarification
df1: df2:
Date (yyyy-mm-dd hh:mm:ss) Date (yyyy-mm-dd hh:--:--)
2016-03-01 1:02:03 2016-03-01 1:00:00
2016-04-01 1:03:04 2016-04-01 2:00:00
2016-05-01 10:04:05 2016-05-01 3:00:00
2016-05-01 11:07:08 2016-05-01 4:00:00
The desired output is:
df1:
2016-03-01 1:02:03
2016-04-01 1:03:04
Only the first two rows in the df1 are extracted because their hours exist in the df2.
Thank you in advance
Upvotes: 2
Views: 660
Reputation: 862406
Use boolean indexing
with Series.dt.hour
for extract hours with Series.isin
:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
df = df1[df1['Date'].dt.hour.isin(df2['Date'].dt.hour)]
print (df)
Date
0 2016-03-01 01:02:03
1 2016-04-01 01:03:04
If need match dates with hours use Series.dt.floor
for match df2['Date']
:
df3 = df1[df1['Date'].dt.floor('H').isin(df2['Date'])]
print (df3)
Date
0 2016-03-01 01:02:03
EDIT: For check how working hours and floor function create helper columns:
print (df1.assign(hour=df1['Date'].dt.hour, floor=df1['Date'].dt.floor('H')))
Date hour floor
0 2016-03-01 01:02:03 1 2016-03-01 01:00:00
1 2016-04-01 01:03:04 1 2016-04-01 01:00:00
2 2016-05-01 10:04:05 10 2016-05-01 10:00:00
3 2016-05-01 11:07:08 11 2016-05-01 11:00:00
print (df2.assign(hour=df2['Date'].dt.hour))
Date hour
0 2016-03-01 01:00:00 1
1 2016-04-01 02:00:00 2
2 2016-05-01 03:00:00 3
3 2016-05-01 04:00:00 4
EDIT1: Problem was with timezones in df2['Date'
, solution is remove them by Series.dt.tz_localize
:
df3 = df2[df2['Date'].dt.floor('H').dt.tz_localize(None).isin(df1['Date'])]
Upvotes: 2