Saturnix
Saturnix

Reputation: 10564

Fastest and most correct way to search for date ignoring time, between 2 pandas dataframes

I have 2 dataframes. One has a datetime index and contains daily data points:

df1.head(5)

            data_point
Date                                                                 
2004-04-01  1.180180     
2004-04-02  1.213555      
2004-04-05  1.243821      
2004-04-06  1.224543      
2004-04-07  1.213832    

The other one has 5 minutes data points:

df2.head(5)  

                      Open   High                   Datetime
Date_Time                                                                   
2010-01-04 09:00:00  22.40  22.45         2010-01-04 09:00:00
2010-01-04 09:05:00  22.45  22.45         2010-01-04 09:05:00
2010-01-04 09:10:00  22.40  22.45         2010-01-04 09:10:00
2010-01-04 09:15:00  22.45  22.45         2010-01-04 09:15:00
2010-01-04 09:25:00  22.35  22.35         2010-01-04 09:25:00

Problem: While I'm iterating through the values of df2, I need to know what's the daily value in df1.

This would require to query df1 288 times for each day (there are 288 units of 5 minutes in a day) to get the same data.

Question: what would be the most correct and efficient way to solve this problem? Merging the 2 dataframes so that df3 looks like this? But how?

                      Open   High                   daily_data_point_from_df1
Date_Time                                                                   
2010-01-04 09:00:00  22.40  22.45                   1.111
2010-01-04 09:05:00  22.45  22.45                   1.111  
2010-01-04 09:10:00  22.40  22.45                   1.111
2010-01-04 09:15:00  22.45  22.45                   1.111   
2010-01-04 09:25:00  22.35  22.35                   1.111

Or should I query df1 every day and manually handle the logic to avoid doing it many times for the same day?

for index, row in df2.iterrows():
     # Manually check if date changed and do this if it did:
     date = df2['Datetime'].replace(hour=0, minute=0)
     daily_data_point_from_df1 = df.loc[date]

Upvotes: 0

Views: 36

Answers (1)

BENY
BENY

Reputation: 323226

You can using merge

df3=df2.assign(Date=df2.Datetime.dt.date).merge(df1.reset_index(),on='Date',how='left')

Upvotes: 1

Related Questions