Reputation: 411
I have a dataframe
that contains multiple "stacks" and their corresponding "lengths".
df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})
stack-1-material stack-2-material stack-1-length stack-2-length
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
I am trying to create a separate column for each material that tracks the cumulative sum of the length regardless of which "stack" they're. I've tried using groupby
but am only able to get the cumulative sum into a single column. Here is what I'm looking for:
stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum
0 rock rock 3 3 6 0 0
1 paper paper 1 1 6 2 0
2 paper rock 1 3 9 3 0
3 scissors paper 2 1 9 4 2
4 rock scissors 3 2 12 4 4
Upvotes: 3
Views: 536
Reputation: 29635
you can use the columns materials as mask on the columns length, then sum
along the column and cumsum
, for each material.
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length \
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0
Upvotes: 6
Reputation: 59519
First, reverse your column names that way we can use wide_to_long
to reshape the DataFrame.
Then take the cumsum
within material and determine the max value per material per row. We can then reshape
this and ffill
and replace the remaining NaN
with 0s and join back to the original.
df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]
res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'],
i='index', j='whatever', suffix='.*')
.sort_index(level=0))
# material length
#index whatever
#0 -stack-1 rock 3
# -stack-2 rock 3
#1 -stack-1 paper 1
# -stack-2 paper 1
#2 -stack-1 paper 1
# -stack-2 rock 3
#3 -stack-1 scissors 2
# -stack-2 paper 1
#4 -stack-1 rock 3
# -stack-2 scissors 2
res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
.unstack(-1).ffill().fillna(0, downcast='infer')
.add_suffix('_cumsum'))
df = pd.concat([df, res], axis=1)
material-stack-1 material-stack-2 length-stack-1 length-stack-2 paper_cumsum rock_cumsum scissors_cumsum
0 rock rock 3 3 0 6 0
1 paper paper 1 1 2 6 0
2 paper rock 1 3 3 9 0
3 scissors paper 2 1 4 9 2
4 rock scissors 3 2 4 12 4
Upvotes: 3