Chopin
Chopin

Reputation: 214

Cumsum with nan values - pandas

I want to pass a cumulative sum of unique values to a separate column. However, I want to disregard nan values so it essentially skips these rows and continues the count with the next viable row.

d = {'Item': [np.nan, "Blue", "Blue", np.nan, "Red", "Blue", "Blue", "Red"],
}

df = pd.DataFrame(data=d)

df['count'] = df.Item.ne(df.Item.shift()).cumsum()

intended out:

   Item  count
0   NaN    NaN
1  Blue      1
2  Blue      1
3   NaN    NaN 
4   Red      2
5  Blue      3
6  Blue      3
7   Red      4

Upvotes: 2

Views: 663

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24314

Try:

df['count'] =(df.Item.ne(df.Item.shift()) & df.Item.notna()).cumsum().mask(df.Item.isna())

OR

as suggested by @SeanBean:

df['count'] =df.Item.ne(df.Item.shift()).mask(df.Item.isna()).cumsum()

Output of df:

    Item    count
0   NaN     NaN
1   Blue    1.0
2   Blue    1.0
3   NaN     NaN
4   Red     2.0
5   Blue    3.0
6   Blue    3.0
7   Red     4.0

Upvotes: 1

Nk03
Nk03

Reputation: 14949

Here's one way:

NOTE: (you just need to add the where condition):

df['count'] = df.Item.ne(df.Item.shift()).where(~df.Item.isna()).cumsum()

OUTPUT:

   Item  count
0   NaN    NaN
1  Blue    1.0
2  Blue    1.0
3   NaN    NaN
4   Red    2.0
5  Blue    3.0
6  Blue    3.0
7   Red    4.0

Upvotes: 1

Related Questions