Reputation: 77
I know this question exists but I can't find any answer that simple enough to understand and fits my problem. I have a column in a dataframe and I want to keep a running total(cumsum) of this column but resetting on NAN values
Index s_number s_cumsum
0 1 1
1 4 5
2 6 11
3 Nan 0
4 7 7
5 2 9
6 3 12
Upvotes: 2
Views: 2192
Reputation: 11
Turn NaNs into a negative cumsum of previous values, then the cumsum will reset it to 0 at NaNs.
I doubled the df to show how it works.
for i in df.loc[np.isnan(df['s_number'])].index:
df['s_number'] = -sum(df[:i])
df['cumsum'] = df['s_number'].cumsum()
index s_number s_cumsum
0 0 1.0 1
1 1 4.0 5
2 2 6.0 11
3 3 -11.0 0
4 4 7.0 7
5 5 2.0 9
6 6 3.0 12
7 0 1.0 13
8 1 4.0 17
9 2 6.0 23
10 3 -23.0 0
11 4 7.0 7
12 5 2.0 9
13 6 3.0 12
Upvotes: 1
Reputation: 402483
Use groupby
and cumsum
:
df['s_cumsum'] = df.s_number.groupby(df.s_number.isna().cumsum()).cumsum()
df
Index s_number s_cumsum
0 0 1.0 1.0
1 1 4.0 5.0
2 2 6.0 11.0
3 3 NaN NaN
4 4 7.0 7.0
5 5 2.0 9.0
6 6 3.0 12.0
Note that if "s_number" is a column of strings, use
df['s_number'] = pd.to_numeric(df['s_number'], errors='coerce)
...first, to get a float column with NaNs.
If you want to fill the NaNs,
df['s_cumsum'] = (df.s_number.groupby(df.s_number.isna().cumsum())
.cumsum()
.fillna(0, downcast='infer'))
df
Index s_number s_cumsum
0 0 1.0 1
1 1 4.0 5
2 2 6.0 11
3 3 NaN 0
4 4 7.0 7
5 5 2.0 9
6 6 3.0 12
Upvotes: 9