How to replace NaN with column mean only if less than a given number of NaN in a column?

Question

I have a quite big dataset, with 200 rows and 6000+ columns.

I already know the Pandas function to replace the NaNs with the mean of each column:

df.fillna(df.mean())

My problem is, I want to use it only on those columns in which the total number of NaNs is equal or less than 3. Any Hints or solutions would be very much appreciated.

yatu · Accepted Answer

The following should work:

m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))

Checking on a test dataframe:

df = pd.DataFrame(np.random.choice([float('nan'), *range(3)], (10, 3)))

print(df)

   0    1    2
0  NaN  1.0  0.0
1  2.0  0.0  NaN
2  0.0  1.0  0.0
3  NaN  1.0  0.0
4  2.0  NaN  1.0
5  NaN  2.0  1.0
6  2.0  2.0  NaN
7  NaN  NaN  NaN
8  1.0  NaN  0.0
9  2.0  0.0  2.0

m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))

print(df)

     0    1         2
0  NaN  1.0  0.000000
1  2.0  0.0  0.571429
2  0.0  1.0  0.000000
3  NaN  1.0  0.000000
4  2.0  1.0  1.000000
5  NaN  2.0  1.000000
6  2.0  2.0  0.571429
7  NaN  1.0  0.571429
8  1.0  1.0  0.000000
9  2.0  0.0  2.000000

How to replace NaN with column mean only if less than a given number of NaN in a column?

Answers (1)

Related Questions