Pandas: Adding Two Columns From Two Dataframes With Different Time Series

Question

I have two dataframes (let's call them df_1 and df_2):

epoch_ns                        net
1635747118006744249               64.75
1635747144020144197             -135.27

epoch_ns                        net
1635747085588589391              -15.25
1635747224568306088             -775.29
1635747242571214411               24.66

I would like to add both df_1['net'] and df_2['net'] together to have a resultant dataframe like this:

epoch_ns                        net
1635747085588589391              -15.25
1635747118006744249                49.5
1635747144020144197             -150.52
1635747224568306088             -910.56
1635747242571214411             −910.56

One way is to resample both frames with 1ms frequency (losing the precision) but it could use a lot of memory when my both frames have a long time horizontal.

Any other more efficient way (time wise and memory wise)?

mozway · Accepted Answer

You can use pandas.concat, and cumsum.

As you already have a cumulated sum as input, you first need to take the diff:

(pd.concat([df1.assign(net=df1['net'].diff().combine_first(df1['net'])),
            df2.assign(net=df2['net'].diff().combine_first(df2['net'])),
           ])
   .sort_values(by='epoch_ns')
   .assign(net=lambda d:d['net'].cumsum())
 )

output:

              epoch_ns     net
0  1635747085588589391  -15.25
0  1635747118006744249   49.50
1  1635747144020144197 -150.52
1  1635747224568306088 -910.56
2  1635747242571214411 -110.61

intermediate step to uncumulate the net (example on df2):

>>> df2.assign(raw=df2['net'].diff().combine_first(df2['net']))
              epoch_ns     net     raw
0  1635747085588589391  -15.25  -15.25
1  1635747224568306088 -775.29 -760.04
2  1635747242571214411   24.66  799.95

original answer

(pd.concat([df1,df2])
   .sort_values(by='epoch_ns')
   .assign(net=lambda d:d['net'].cumsum())
 )

Output:

              epoch_ns     net
0  1635747085588589391  -15.25
0  1635747118006744249   49.50
1  1635747144020144197  -85.77
1  1635747224568306088 -861.06
2  1635747242571214411 -836.40

Pandas: Adding Two Columns From Two Dataframes With Different Time Series

Answers (1)

original answer

Related Questions