Cumsum with nan values - pandas

Question

I want to pass a cumulative sum of unique values to a separate column. However, I want to disregard nan values so it essentially skips these rows and continues the count with the next viable row.

d = {'Item': [np.nan, "Blue", "Blue", np.nan, "Red", "Blue", "Blue", "Red"],
}

df = pd.DataFrame(data=d)

df['count'] = df.Item.ne(df.Item.shift()).cumsum()

intended out:

   Item  count
0   NaN    NaN
1  Blue      1
2  Blue      1
3   NaN    NaN 
4   Red      2
5  Blue      3
6  Blue      3
7   Red      4

Anurag Dabas · Accepted Answer

Try:

df['count'] =(df.Item.ne(df.Item.shift()) & df.Item.notna()).cumsum().mask(df.Item.isna())

OR

as suggested by @SeanBean:

df['count'] =df.Item.ne(df.Item.shift()).mask(df.Item.isna()).cumsum()

Output of df:

    Item    count
0   NaN     NaN
1   Blue    1.0
2   Blue    1.0
3   NaN     NaN
4   Red     2.0
5   Blue    3.0
6   Blue    3.0
7   Red     4.0

Cumsum with nan values - pandas

Answers (2)

OUTPUT:

Related Questions