Conillina
Conillina

Reputation: 33

How to replace NaN with column mean only if less than a given number of NaN in a column?

I have a quite big dataset, with 200 rows and 6000+ columns.

I already know the Pandas function to replace the NaNs with the mean of each column:

df.fillna(df.mean())

My problem is, I want to use it only on those columns in which the total number of NaNs is equal or less than 3. Any Hints or solutions would be very much appreciated.

Upvotes: 0

Views: 145

Answers (1)

yatu
yatu

Reputation: 88305

The following should work:

m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))

Checking on a test dataframe:

df = pd.DataFrame(np.random.choice([float('nan'), *range(3)], (10, 3)))

print(df)

   0    1    2
0  NaN  1.0  0.0
1  2.0  0.0  NaN
2  0.0  1.0  0.0
3  NaN  1.0  0.0
4  2.0  NaN  1.0
5  NaN  2.0  1.0
6  2.0  2.0  NaN
7  NaN  NaN  NaN
8  1.0  NaN  0.0
9  2.0  0.0  2.0

m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))

print(df)

     0    1         2
0  NaN  1.0  0.000000
1  2.0  0.0  0.571429
2  0.0  1.0  0.000000
3  NaN  1.0  0.000000
4  2.0  1.0  1.000000
5  NaN  2.0  1.000000
6  2.0  2.0  0.571429
7  NaN  1.0  0.571429
8  1.0  1.0  0.000000
9  2.0  0.0  2.000000

Upvotes: 1

Related Questions