PParker
PParker

Reputation: 1511

Calculate the cumulative count for all NaN values in specific column

I have a dataframe:

# create example df
df = pd.DataFrame(index=[1,2,3,4,5,6,7])
df['ID'] = [1,1,1,1,2,2,2]
df['election_date'] = pd.date_range("01/01/2010", periods=7, freq="M")
df['stock_price'] = [1,np.nan,np.nan,4,5,np.nan,7]

# sort values
df.sort_values(['election_date'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True)
df

   ID   election_date   stock_price
0   2   2010-07-31       7.0
1   2   2010-06-30       NaN
2   2   2010-05-31       5.0
3   1   2010-04-30       4.0
4   1   2010-03-31       NaN
5   1   2010-02-28       NaN
6   1   2010-01-31       1.0

I would like to calculate the cumulative count of all np.nan for column stock_price for every ID.

The expected result is:

df

   ID   election_date   stock_price  cum_count_nans
0   2   2010-07-31       7.0            1
1   2   2010-06-30       NaN            0
2   2   2010-05-31       5.0            0   
3   1   2010-04-30       4.0            2  
4   1   2010-03-31       NaN            1
5   1   2010-02-28       NaN            0
6   1   2010-01-31       1.0            0

Any ideas how to solve it?

Upvotes: 1

Views: 179

Answers (1)

jezrael
jezrael

Reputation: 862851

Idea is change order by indexing, and then in custom function testing missing values, shifting and used cumlative sum:

f = lambda x: x.isna().shift(fill_value=0).cumsum()
df['cum_count_nans'] = df.iloc[::-1].groupby('ID')['stock_price'].transform(f)
print (df)
   ID election_date  stock_price cum_count_nans
0   2    2010-07-31          7.0              1
1   2    2010-06-30          NaN              0
2   2    2010-05-31          5.0              0
3   1    2010-04-30          4.0              2
4   1    2010-03-31          NaN              1
5   1    2010-02-28          NaN              0
6   1    2010-01-31          1.0              0

Upvotes: 2

Related Questions