Reputation: 1145
I've got the following Pandas DataFrame df:
Value
time Position
1493791210867023000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210880251000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210888418000 0.0 21156.0
1.0 1230225.0
... ... ...
How can I efficiently sum along the index "Position"? The exact summation formula I am trying to implement is:
Value Result
time Position
1493791210867023000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
4.0 3388164.0 Sum from 0.0 to 4.0
1493791210880251000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
... ... ... ...
My current solution takes too long (IndexSlice is painfully slow) and I am not all too sure, as to how I could sort the sum-results efficiently into the (newly created) "Result" column?
import pandas as pd
import numpy as np
idx = pd.IndexSlice
res = {}
for i in range(5):
res[i] = df.loc[idx[:, :i]].groupby(level="time").sum()
df["Result"] = 0 #fill Result now with res[i] depending on position
Upvotes: 2
Views: 290
Reputation: 294258
Try to use cumsum
within a groupby
df.assign(Result=df.groupby(level='time').Value.cumsum())
# suggested by @ScottBoston for pandas 0.20.1+
# df.assign(Result=df.groupby('time').Value.cumsum())
Value Result
time Position
1493791210867023000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0
2.0 1628088.0 2879469.0
3.0 2582359.0 5461828.0
4.0 3388164.0 8849992.0
1493791210880251000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0
2.0 1628088.0 2879469.0
3.0 2582359.0 5461828.0
4.0 3388164.0 8849992.0
1493791210888418000 0.0 21156.0 21156.0
1.0 1230225.0 1251381.0
Upvotes: 4