Haseeb Sultan
Haseeb Sultan

Reputation: 103

Drop rows in pandas if null values condition met

How to remove the rows if the nan values count is greater than equal (>=) to 5?

DataFrame looks like this :

user_id b1 b2 b3 b4 b5 b6 b7 b8
1 NaN 1 NaN NaN NaN NaN 3 2
2 1 3 4 2 5 7 8 6
3 NaN 1 NaN 2 NaN 3 NaN NaN
4 1 3 4 NaN 5 2 7 6
5 NaN 3 2 NaN 4 1 5 NaN
6 NaN NaN NaN NaN NaN NaN 2 1

Upvotes: 0

Views: 956

Answers (2)

rhug123
rhug123

Reputation: 8768

The thresh parameter in dropna() can be used for this. This parameter looks at the count of non NaN values, and will drop the row if there is not at least that many values present.

For this problem, since there are 8 columns, using a thresh of 4 will make sure that at most only 4 NaN values can exist in each row.

df.dropna(thresh = 4)

Upvotes: 2

sitting_duck
sitting_duck

Reputation: 3720

You could leverage isna() to identify the NaNs and then sum() those with axis=1 to get the NaN count per row. Then use that as a (negated) mask to keep the rows you want.

dfd = df.loc[~(df.isna().sum(axis=1)>=5)]
print(dfd)

Result

          bl   b2   b3   b4   b5   b6   b7   b8
user_id                                        
2        1.0  3.0  4.0  2.0  5.0  7.0  8.0  6.0
4        1.0  3.0  4.0  NaN  5.0  2.0  7.0  6.0
5        NaN  3.0  2.0  NaN  4.0  1.0  5.0  NaN

Upvotes: 2

Related Questions