Pandas dataframe, cumsum reset on NAN

Question

I know this question exists but I can't find any answer that simple enough to understand and fits my problem. I have a column in a dataframe and I want to keep a running total(cumsum) of this column but resetting on NAN values

 Index  s_number  s_cumsum
  0       1         1
  1       4         5
  2       6         11
  3       Nan       0
  4       7         7
  5       2         9
  6       3         12

cs95 · Accepted Answer

Use groupby and cumsum:

df['s_cumsum'] = df.s_number.groupby(df.s_number.isna().cumsum()).cumsum()
df

   Index  s_number  s_cumsum
0      0       1.0       1.0
1      1       4.0       5.0
2      2       6.0      11.0
3      3       NaN       NaN
4      4       7.0       7.0
5      5       2.0       9.0
6      6       3.0      12.0

Note that if "s_number" is a column of strings, use

df['s_number'] = pd.to_numeric(df['s_number'], errors='coerce)

...first, to get a float column with NaNs.

If you want to fill the NaNs,

df['s_cumsum'] = (df.s_number.groupby(df.s_number.isna().cumsum())
                    .cumsum()
                    .fillna(0, downcast='infer'))
df

   Index  s_number  s_cumsum
0      0       1.0         1
1      1       4.0         5
2      2       6.0        11
3      3       NaN         0
4      4       7.0         7
5      5       2.0         9
6      6       3.0        12

Pandas dataframe, cumsum reset on NAN

Answers (2)

Related Questions