danwri
danwri

Reputation: 173

Can I join two dataframes while only retaining rows based on datetimes featured in the second dataframe?

Dataframe A ('df_a') contains location-split temperature values at re-sampled 5-minute intervals:

logtime_round       | location  | value
2017-05-01 06:05:00 | 0         | 17
2017-05-01 06:05:00 | 1         | 14.5
2017-05-01 06:05:00 | 2         | 14.5
etc...

Dataframe B ('df_b') contains temperature values (re-sampled from hourly to daily):

logtime_round | airtemp
2017-05-01    | 10.33333
2017-05-02    | 10.42083
etc...

I have manipulated df_b so that only airtemp (format: datetime64[ns]) <= 15.5 are included, and now would like to manipulate df_a so that a new dataframe is created featuring only the same days included in df_b (I'm only interested in locations and values when outdoor air temperature was below <= 15.5).

Is this possible?

My first plan was to join the two dataframes and then look to remove any NaN airtemp values to get my desired df, however, the df_b airtemp is only featured for the first row (e.g. for 2017-05-01) with the rest as NaNs. So perhaps the df_b daily airtemp can be duplicated across all rows in the same day?

joindf = df_a.join(df_b)

Thanks!

Upvotes: 1

Views: 32

Answers (1)

Code Different
Code Different

Reputation: 93181

Use merge_asof (assuming both frames have been sorted by time):

pd.merge_asof(df_a, df_b, on='logtime_round')

Upvotes: 1

Related Questions