Luxo_Jr
Luxo_Jr

Reputation: 411

Conditional Cumulative Sum Based Multiple Pandas Columns

I have a dataframe that contains multiple "stacks" and their corresponding "lengths".

df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})

  stack-1-material stack-2-material  stack-1-length  stack-2-length
0             rock             rock               3               3
1            paper            paper               1               1
2            paper             rock               1               3
3         scissors            paper               2               1
4             rock         scissors               3               2

I am trying to create a separate column for each material that tracks the cumulative sum of the length regardless of which "stack" they're. I've tried using groupby but am only able to get the cumulative sum into a single column. Here is what I'm looking for:

  stack-1-material stack-2-material  stack-1-length  stack-2-length  rock_cumsum  paper_cumsum  scissors_cumsum
0             rock             rock               3               3            6             0                0
1            paper            paper               1               1            6             2                0
2            paper             rock               1               3            9             3                0
3         scissors            paper               2               1            9             4                2
4             rock         scissors               3               2           12             4                4 

Upvotes: 3

Views: 536

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you can use the columns materials as mask on the columns length, then sum along the column and cumsum, for each material.

#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')

# get all unique material
l_mat = np.unique(material)

# iterate over nique materials
for mat in l_mat:
    df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()

print(df)
  stack-1-material stack-2-material  stack-1-length  stack-2-length  \
0             rock             rock               3               3   
1            paper            paper               1               1   
2            paper             rock               1               3   
3         scissors            paper               2               1   
4             rock         scissors               3               2   

   rock_cumsum  paper_cumsum  scissors_cumsum  
0          6.0           0.0              0.0  
1          6.0           2.0              0.0  
2          9.0           3.0              0.0  
3          9.0           4.0              2.0  
4         12.0           4.0              4.0  

Upvotes: 6

ALollz
ALollz

Reputation: 59519

First, reverse your column names that way we can use wide_to_long to reshape the DataFrame.

Then take the cumsum within material and determine the max value per material per row. We can then reshape this and ffill and replace the remaining NaN with 0s and join back to the original.

df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]

res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'], 
                       i='index', j='whatever', suffix='.*')
         .sort_index(level=0))

#                material  length
#index whatever                  
#0     -stack-1      rock       3
#      -stack-2      rock       3
#1     -stack-1     paper       1
#      -stack-2     paper       1
#2     -stack-1     paper       1
#      -stack-2      rock       3
#3     -stack-1  scissors       2
#      -stack-2     paper       1
#4     -stack-1      rock       3
#      -stack-2  scissors       2

res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
          .unstack(-1).ffill().fillna(0, downcast='infer')
          .add_suffix('_cumsum'))

df = pd.concat([df, res], axis=1)

  material-stack-1 material-stack-2  length-stack-1  length-stack-2  paper_cumsum  rock_cumsum  scissors_cumsum
0             rock             rock               3               3             0            6                0
1            paper            paper               1               1             2            6                0
2            paper             rock               1               3             3            9                0
3         scissors            paper               2               1             4            9                2
4             rock         scissors               3               2             4           12                4

Upvotes: 3

Related Questions