Reputation: 233
I am trying to get sum of dynamic columns based on certain condition.
cols = ['ID','A','B','C','D','E','F','G']
dataframe df has all the columns listed above.
If ID = 2, I need sum of first two columns A, B IF ID = 3, I need sum of first three columns A,B,C
df.loc['SUM'] = df.loc[df['ID'] > 0,cols[0:df['ID']]].sum(axis=1)
above line of code is giving an error that :
TypeError: slice indices must be integers or None or have an __index__ method
Note : ID can be any number but it would be always less than or equal to total number of columns
There are no nan and missing values in dataframe.
Upvotes: 5
Views: 267
Reputation: 18628
Another way :
with df :
0 1 2 3 4 5 6 7
ID
4 6 1 4 4 8 4 6 3
7 5 8 7 9 9 2 7 8
4 8 9 2 6 9 5 4 1
3 4 6 1 3 4 9 2 4
5 4 4 8 1 2 1 5 8
sums are computed by df.cumsum(axis=1).values[range(len(df)),df.index-1]
it's 5x faster than df.where(np.arange(df.shape[1]) <df.index.values[:,None]).sum(axis=1)
.
Upvotes: 1
Reputation: 402413
You can vectorise this thanks to the power of NumPy:
df.set_index('ID', inplace=True)
df.wherenp.arange(df.shape[1]) < df.index.values[:,None]).sum(axis=1)
Minimal Reproducible Example
import pandas as pd
pd.__version__
# '0.24.2'
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 10, (5, 8)))
df.insert(0, 'ID', np.random.randint(1, 8, 5))
df
ID 0 1 2 3 4 5 6 7
0 4 6 1 4 4 8 4 6 3
1 7 5 8 7 9 9 2 7 8
2 4 8 9 2 6 9 5 4 1
3 3 4 6 1 3 4 9 2 4
4 5 4 4 8 1 2 1 5 8
df.set_index('ID', inplace=True)
df.assign(
SUM=df.where(np.arange(df.shape[1]) < df.index.values[:,None]).sum(axis=1))
0 1 2 3 4 5 6 7 SUM
ID
4 6 1 4 4 8 4 6 3 15.0
7 5 8 7 9 9 2 7 8 47.0
4 8 9 2 6 9 5 4 1 25.0
3 4 6 1 3 4 9 2 4 11.0
5 4 4 8 1 2 1 5 8 19.0
The where
function will hide values that we don't want to sum over:
df.where(np.arange(df.shape[1]) < df.index[:,None])
0 1 2 3 4 5 6 7
ID
4 6 1 4 4.0 NaN NaN NaN NaN
7 5 8 7 9.0 9.0 2.0 7.0 NaN
4 8 9 2 6.0 NaN NaN NaN NaN
3 4 6 1 NaN NaN NaN NaN NaN
5 4 4 8 1.0 2.0 NaN NaN NaN
Thanks to @Quang Hoang for pointing out a potential bug in the solution.
Upvotes: 3