Bryan Downing
Bryan Downing

Reputation: 207

Join Python dataframe time series efficiently

I have the following 2 dataframes with:

day
        date     val
11740 2016-01-04  1.3970
11741 2016-01-05  1.3991
11742 2016-01-06  1.4084
11743 2016-01-07  1.4061

and

df
        Adj_Close         Close        Date          High           Low
182  12927.200195  12927.200195  2016-01-04  12928.900391      12748.50   
181  12920.099609  12920.099609  2016-01-05  12954.900391  12839.799805   
180  12726.799805  12726.799805  2016-01-06  12854.599609  12701.700195   
179  12448.200195  12448.200195  2016-01-07  12661.200195  12439.099609 

I have a cheesy loop to aligh the date to create a new dataframe (new_df) by 'joining' the common date.

new_df = pd.DataFrame(columns=['date', 'close', 'fx', 'usd'])

for indexFx, rowFx in day.iterrows():
    for indexSt, rowSt in df.iterrows(): #this is not efficient 
        fxDate = str(rowFx.date)[:10] #only keep data component not time
        if str(rowSt['Date']) == fxDate:

            dateObj = datetime.strptime(rowSt.Date,'%Y-%m-%d')
            row = [dateObj, rowSt.Close,rowFx.val, float(rowSt.Close) * float(rowFx.val)]
            new_df.loc[len(new_df)] = row

I know there is an efficient way to Pythonize this loop. Can someone help?

Thanks

Upvotes: 2

Views: 318

Answers (1)

vk1011
vk1011

Reputation: 7179

pd.concat([day.set_index('date'), df.set_index('Date')], axis=1)

>>>

               val     Adj_Close         Close          High  \
2016-01-04  1.3970  12927.200195  12927.200195  12928.900391
2016-01-05  1.3991  12920.099609  12920.099609  12954.900391
2016-01-06  1.4084  12726.799805  12726.799805  12854.599609
2016-01-07  1.4061  12448.200195  12448.200195  12661.200195

                     Low
2016-01-04  12748.500000
2016-01-05  12839.799805
2016-01-06  12701.700195
2016-01-07  12439.099609

Depending on if you want an inner or outer join, you can specify that with join='inner' or join='outer'.

Upvotes: 1

Related Questions