Reputation: 4307
I have
df = pd.DataFrame.from_dict({'id': ['A', 'B', 'A', 'C', 'D', 'B', 'C'], 'val': [1,2,-3,1,5,6,-2], 'stuff':['12','23232','13','1234','3235','3236','732323']})
id stuff val
0 A 12 1
1 B 23232 2
2 A 13 -3
3 C 1234 1
4 D 3235 5
5 B 3236 6
6 C 732323 -2
I'd like to get a running sum of val
for each id
, so the desired output looks like this:
id stuff val cumsum
0 A 12 1 1
1 B 23232 2 2
2 A 13 -3 -2
3 C 1234 1 1
4 D 3235 5 5
5 B 3236 6 8
6 C 732323 -2 -1
This is what I tried:
df['cumsum'] = df.groupby('id').cumsum(['val'])
This is the error I get:
ValueError: Wrong number of items passed 0, placement implies 1
Upvotes: 61
Views: 63712
Reputation: 23041
cumsum
is one of those functions (e.g. cumprod
, rank
etc.) that return a Series / dataframe that is indexed the same as the original dataframe, so all methods to supply a function to groupby
work (and produce the same output).
All of the following are equivalent.
x = df.groupby('id')['val'].agg('cumsum')
y = df.groupby('id')['val'].apply('cumsum')
z = df.groupby('id')['val'].cumsum()
w = df.groupby('id')['val'].transform('cumsum')
all(x.equals(d) for d in [y, z, w]) # True
Also, df.groupby('id').cumsum()
computes the cumulative sum for all columns in df
grouped by 'id'
.
Upvotes: 2
Reputation: 393943
You can call transform
and pass the cumsum
function to add that column to your df:
In [156]:
df['cumsum'] = df.groupby('id')['val'].transform(pd.Series.cumsum)
df
Out[156]:
id stuff val cumsum
0 A 12 1 1
1 B 23232 2 2
2 A 13 -3 -2
3 C 1234 1 1
4 D 3235 5 5
5 B 3236 6 8
6 C 732323 -2 -1
With respect to your error, you can't call cumsum
on a Series groupby object, secondly you're passing the name of the column as a list which is meaningless.
So this works:
In [159]:
df.groupby('id')['val'].cumsum()
Out[159]:
0 1
1 2
2 -2
3 1
4 5
5 8
6 -1
dtype: int64
Upvotes: 99