Asa Ya
Asa Ya

Reputation: 69

Filtering a dataframe according to datetime column of other dataframe

I have two dataframes, denoted by df1 and df2. The df1 has 6 columns and df2 has 4 columns. The df1 has a column date that the smallest unit is second, but in the df2 is the hour. I am going to filter the df1 according to the df2. It means, I need to extract all records in a df1 that has the same hour as the df2.

Sample of data for more clarification

df1:                                  df2:

Date (yyyy-mm-dd hh:mm:ss)           Date (yyyy-mm-dd hh:--:--)

2016-03-01 1:02:03                   2016-03-01 1:00:00
2016-04-01 1:03:04                   2016-04-01 2:00:00
2016-05-01 10:04:05                  2016-05-01 3:00:00
2016-05-01 11:07:08                  2016-05-01 4:00:00

The desired output is:

df1: 

2016-03-01 1:02:03
2016-04-01 1:03:04

Only the first two rows in the df1 are extracted because their hours exist in the df2.

Thank you in advance

Upvotes: 2

Views: 660

Answers (1)

jezrael
jezrael

Reputation: 862406

Use boolean indexing with Series.dt.hour for extract hours with Series.isin:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])


df = df1[df1['Date'].dt.hour.isin(df2['Date'].dt.hour)]
print (df)
                 Date
0 2016-03-01 01:02:03
1 2016-04-01 01:03:04

If need match dates with hours use Series.dt.floor for match df2['Date']:

df3 = df1[df1['Date'].dt.floor('H').isin(df2['Date'])]
print (df3)
                 Date
0 2016-03-01 01:02:03

EDIT: For check how working hours and floor function create helper columns:

print (df1.assign(hour=df1['Date'].dt.hour, floor=df1['Date'].dt.floor('H')))

                 Date  hour               floor
0 2016-03-01 01:02:03     1 2016-03-01 01:00:00
1 2016-04-01 01:03:04     1 2016-04-01 01:00:00
2 2016-05-01 10:04:05    10 2016-05-01 10:00:00
3 2016-05-01 11:07:08    11 2016-05-01 11:00:00

print (df2.assign(hour=df2['Date'].dt.hour))

                 Date  hour
0 2016-03-01 01:00:00     1
1 2016-04-01 02:00:00     2
2 2016-05-01 03:00:00     3
3 2016-05-01 04:00:00     4

EDIT1: Problem was with timezones in df2['Date', solution is remove them by Series.dt.tz_localize:

df3 = df2[df2['Date'].dt.floor('H').dt.tz_localize(None).isin(df1['Date'])]

Upvotes: 2

Related Questions