Javier
Javier

Reputation: 513

How to remove rows that have all NaN values for a specific value in another column?

I have this df:

         CODE    MONTH_DAY TMAX_x
0        113250   01-01    nan
1        113250   01-02    nan
2        113250   01-03    nan
3        113250   01-04    nan
4        113250   01-05    nan
        ...     ...     ...
16975  47E94706   12-27    22.4
16976  47E94706   12-28    21.1
16977  47E94706   12-29    23.2
16978  47E94706   12-30    24.9
16979  47E94706   12-31    24.4

[16980 rows x 3 columns]

I want to remove rows that have a specific code value in CODE column, only if ALL of their TMAX_x values are nan.

For example: If ALL rows that have CODE equal to 133250 have only nan values in TMAX_x i must remove that rows. (There must be all values nan in TMAX_x column for a specific CODE value, if at least 1 value are different to nan in TMAX_x i must not remove any row).

Expected result:

         CODE    MONTH_DAY TMAX_x
        ...     ...     ...
16975  47E94706   12-27    22.4
16976  47E94706   12-28    21.1
16977  47E94706   12-29    23.2
16978  47E94706   12-30    24.9
16979  47E94706   12-31    24.4

Thanks in advance.

Upvotes: 0

Views: 42

Answers (1)

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

You can use groupby followed by filter to keep groups that have at least one non-null value in TMAX_x column:

df.groupby('CODE').filter(lambda gp: gp.TMAX_x.notna().any())

Upvotes: 1

Related Questions