Reputation: 103
How to remove the rows if the nan values count is greater than equal (>=) to 5?
user_id | b1 | b2 | b3 | b4 | b5 | b6 | b7 | b8 |
---|---|---|---|---|---|---|---|---|
1 | NaN | 1 | NaN | NaN | NaN | NaN | 3 | 2 |
2 | 1 | 3 | 4 | 2 | 5 | 7 | 8 | 6 |
3 | NaN | 1 | NaN | 2 | NaN | 3 | NaN | NaN |
4 | 1 | 3 | 4 | NaN | 5 | 2 | 7 | 6 |
5 | NaN | 3 | 2 | NaN | 4 | 1 | 5 | NaN |
6 | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 1 |
Upvotes: 0
Views: 956
Reputation: 8768
The thresh
parameter in dropna()
can be used for this. This parameter looks at the count of non NaN values, and will drop the row if there is not at least that many values present.
For this problem, since there are 8 columns, using a thresh
of 4 will make sure that at most only 4 NaN values can exist in each row.
df.dropna(thresh = 4)
Upvotes: 2
Reputation: 3720
You could leverage isna()
to identify the NaNs and then sum()
those with axis=1
to get the NaN count per row. Then use that as a (negated) mask to keep the rows you want.
dfd = df.loc[~(df.isna().sum(axis=1)>=5)]
print(dfd)
Result
bl b2 b3 b4 b5 b6 b7 b8
user_id
2 1.0 3.0 4.0 2.0 5.0 7.0 8.0 6.0
4 1.0 3.0 4.0 NaN 5.0 2.0 7.0 6.0
5 NaN 3.0 2.0 NaN 4.0 1.0 5.0 NaN
Upvotes: 2