Reputation: 513
I have this df:
CODE DATE TMAX TMIN PP
0 000130 1991-01-01 NaN NaN 0.0
1 000130 1991-01-02 31.2 NaN 0.0
2 000130 1991-01-03 32.0 21.2 0.0
3 000130 1991-01-04 NaN NaN 0.0
4 000130 1991-01-05 NaN 22.0 0.0
... ... ... ... ...
34995 000135 1997-04-24 NaN NaN 0.0
34996 000135 1997-04-25 NaN NaN 4.0
34997 000135 1997-04-26 NaN 22.1 0.0
34998 000135 1997-04-27 31.0 NaN 5.0
34999 000135 1997-04-28 28.8 24.0 0.0
I'm counting the NaN values by CODE column, in columns TMAX TMIN and PP. So i'm using this code.
dfna=df[['TMAX','TMIN','PP']].isna().groupby(df.CODE).sum()
But i want to start counting NaN values since the first non NaN value.
Expected df:
CODE TMAX TMIN PP
000130 2 1 0
000135 0 1 0
...
...
How can i do this?
Thanks in advance.
Upvotes: 2
Views: 30
Reputation: 150745
Think in term of the whole frame, you can use ffill
to remove the later nan values. So you can use this to detect the nan's that come after the first valid values:
df.isna() & df.ffill().notna()
Now, you can try groupby.apply
(df[['TMAX','TMIN','PP']].groupby(df['CODE'])
.apply(lambda d: (d.isna() & d.ffill().notna()).sum())
)
Output:
TMAX TMIN PP
CODE
130 2 1 0
135 0 1 0
Upvotes: 2