Bython
Bython

Reputation: 1145

Sum along Pandas Column conditional on MultiIndex value?

I've got the following Pandas DataFrame df:

                                 Value
time                 Position
1493791210867023000  0.0           21156.0
                     1.0         1230225.0
                     2.0         1628088.0
                     3.0         2582359.0
                     4.0         3388164.0
1493791210880251000  0.0           21156.0
                     1.0         1230225.0
                     2.0         1628088.0
                     3.0         2582359.0
                     4.0         3388164.0
1493791210888418000  0.0           21156.0
                     1.0         1230225.0
...                  ...         ...

How can I efficiently sum along the index "Position"? The exact summation formula I am trying to implement is:

                                 Value        Result
time                 Position
1493791210867023000  0.0           21156.0    Sum from 0.0 to 0.0
                     1.0         1230225.0    Sum from 0.0 to 1.0
                     2.0         1628088.0    Sum from 0.0 to 2.0
                     3.0         2582359.0    Sum from 0.0 to 3.0
                     4.0         3388164.0    Sum from 0.0 to 4.0
1493791210880251000  0.0           21156.0    Sum from 0.0 to 0.0
                     1.0         1230225.0    Sum from 0.0 to 1.0
                     2.0         1628088.0    Sum from 0.0 to 2.0
                     3.0         2582359.0    Sum from 0.0 to 3.0
...                  ...         ...          ...

My current solution takes too long (IndexSlice is painfully slow) and I am not all too sure, as to how I could sort the sum-results efficiently into the (newly created) "Result" column?

import pandas as pd
import numpy as np
idx = pd.IndexSlice
res = {}
for i in range(5):
    res[i] = df.loc[idx[:, :i]].groupby(level="time").sum()
df["Result"] = 0 #fill Result now with res[i] depending on position

Upvotes: 2

Views: 290

Answers (1)

piRSquared
piRSquared

Reputation: 294258

Try to use cumsum within a groupby

df.assign(Result=df.groupby(level='time').Value.cumsum())
# suggested by @ScottBoston for pandas 0.20.1+
# df.assign(Result=df.groupby('time').Value.cumsum())

                                  Value     Result
time                Position                      
1493791210867023000 0.0         21156.0    21156.0
                    1.0       1230225.0  1251381.0
                    2.0       1628088.0  2879469.0
                    3.0       2582359.0  5461828.0
                    4.0       3388164.0  8849992.0
1493791210880251000 0.0         21156.0    21156.0
                    1.0       1230225.0  1251381.0
                    2.0       1628088.0  2879469.0
                    3.0       2582359.0  5461828.0
                    4.0       3388164.0  8849992.0
1493791210888418000 0.0         21156.0    21156.0
                    1.0       1230225.0  1251381.0

Upvotes: 4

Related Questions