Reputation: 214
I want to pass a cumulative sum of unique values to a separate column. However, I want to disregard nan values so it essentially skips these rows and continues the count with the next viable row.
d = {'Item': [np.nan, "Blue", "Blue", np.nan, "Red", "Blue", "Blue", "Red"],
}
df = pd.DataFrame(data=d)
df['count'] = df.Item.ne(df.Item.shift()).cumsum()
intended out:
Item count
0 NaN NaN
1 Blue 1
2 Blue 1
3 NaN NaN
4 Red 2
5 Blue 3
6 Blue 3
7 Red 4
Upvotes: 2
Views: 663
Reputation: 24314
Try:
df['count'] =(df.Item.ne(df.Item.shift()) & df.Item.notna()).cumsum().mask(df.Item.isna())
OR
as suggested by @SeanBean
:
df['count'] =df.Item.ne(df.Item.shift()).mask(df.Item.isna()).cumsum()
Output of df
:
Item count
0 NaN NaN
1 Blue 1.0
2 Blue 1.0
3 NaN NaN
4 Red 2.0
5 Blue 3.0
6 Blue 3.0
7 Red 4.0
Upvotes: 1
Reputation: 14949
Here's one way:
NOTE: (you just need to add the where condition):
df['count'] = df.Item.ne(df.Item.shift()).where(~df.Item.isna()).cumsum()
Item count
0 NaN NaN
1 Blue 1.0
2 Blue 1.0
3 NaN NaN
4 Red 2.0
5 Blue 3.0
6 Blue 3.0
7 Red 4.0
Upvotes: 1