How to get a df with the first non nan value onwards?

Question

I have this df:

       CODE      DATE     TMAX   TMIN   PP
0      000130 1991-01-01  NaN    NaN   0.0
1      000130 1991-01-02  31.2   NaN   0.0
2      000130 1991-01-03  32.0   21.2  0.0
3      000130 1991-01-04  NaN    NaN   0.0
4      000130 1991-01-05  NaN    22.0  0.0
      ...        ...   ...   ...  ...
34995  000135 1997-04-24  NaN    NaN   0.0
34996  000135 1997-04-25  NaN    NaN   4.0
34997  000135 1997-04-26  NaN    22.1  0.0
34998  000135 1997-04-27  31.0   NaN   5.0
34999  000135 1997-04-28  28.8   24.0  0.0

I'm counting the NaN values by CODE column, in columns TMAX TMIN and PP. So i'm using this code.

dfna=df[['TMAX','TMIN','PP']].isna().groupby(df.CODE).sum()

But i want to start counting NaN values since the first non NaN value.

Expected df:

CODE   TMAX TMIN PP
000130 2    1    0
000135 0    1    0    
...
...

How can i do this?

Thanks in advance.

Quang Hoang · Accepted Answer

Think in term of the whole frame, you can use ffill to remove the later nan values. So you can use this to detect the nan's that come after the first valid values:

df.isna() & df.ffill().notna()

Now, you can try groupby.apply

(df[['TMAX','TMIN','PP']].groupby(df['CODE'])
    .apply(lambda d: (d.isna() & d.ffill().notna()).sum()) 
)

Output:

      TMAX  TMIN  PP
CODE                
130      2     1   0
135      0     1   0

How to get a df with the first non nan value onwards?

Answers (1)

Related Questions