Reputation: 2649
I have two dataframes (let's call them df_1
and df_2
):
epoch_ns net
1635747118006744249 64.75
1635747144020144197 -135.27
epoch_ns net
1635747085588589391 -15.25
1635747224568306088 -775.29
1635747242571214411 24.66
I would like to add both df_1['net']
and df_2['net']
together to have a resultant dataframe like this:
epoch_ns net
1635747085588589391 -15.25
1635747118006744249 49.5
1635747144020144197 -150.52
1635747224568306088 -910.56
1635747242571214411 −910.56
One way is to resample both frames with 1ms frequency (losing the precision) but it could use a lot of memory when my both frames have a long time horizontal.
Any other more efficient way (time wise and memory wise)?
Upvotes: 0
Views: 187
Reputation: 260640
You can use pandas.concat
, and cumsum
.
As you already have a cumulated sum as input, you first need to take the diff:
(pd.concat([df1.assign(net=df1['net'].diff().combine_first(df1['net'])),
df2.assign(net=df2['net'].diff().combine_first(df2['net'])),
])
.sort_values(by='epoch_ns')
.assign(net=lambda d:d['net'].cumsum())
)
output:
epoch_ns net
0 1635747085588589391 -15.25
0 1635747118006744249 49.50
1 1635747144020144197 -150.52
1 1635747224568306088 -910.56
2 1635747242571214411 -110.61
intermediate step to uncumulate the net (example on df2):
>>> df2.assign(raw=df2['net'].diff().combine_first(df2['net']))
epoch_ns net raw
0 1635747085588589391 -15.25 -15.25
1 1635747224568306088 -775.29 -760.04
2 1635747242571214411 24.66 799.95
(pd.concat([df1,df2])
.sort_values(by='epoch_ns')
.assign(net=lambda d:d['net'].cumsum())
)
Output:
epoch_ns net
0 1635747085588589391 -15.25
0 1635747118006744249 49.50
1 1635747144020144197 -85.77
1 1635747224568306088 -861.06
2 1635747242571214411 -836.40
Upvotes: 1