Reputation: 33
I have a quite big dataset, with 200 rows and 6000+ columns.
I already know the Pandas function to replace the NaNs with the mean of each column:
df.fillna(df.mean())
My problem is, I want to use it only on those columns in which the total number of NaNs is equal or less than 3. Any Hints or solutions would be very much appreciated.
Upvotes: 0
Views: 145
Reputation: 88305
The following should work:
m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))
Checking on a test dataframe:
df = pd.DataFrame(np.random.choice([float('nan'), *range(3)], (10, 3)))
print(df)
0 1 2
0 NaN 1.0 0.0
1 2.0 0.0 NaN
2 0.0 1.0 0.0
3 NaN 1.0 0.0
4 2.0 NaN 1.0
5 NaN 2.0 1.0
6 2.0 2.0 NaN
7 NaN NaN NaN
8 1.0 NaN 0.0
9 2.0 0.0 2.0
m = df.isna().sum(0).le(3)
df.loc[:,m] = df.loc[:,m].fillna(df.loc[:,m].mean(0))
print(df)
0 1 2
0 NaN 1.0 0.000000
1 2.0 0.0 0.571429
2 0.0 1.0 0.000000
3 NaN 1.0 0.000000
4 2.0 1.0 1.000000
5 NaN 2.0 1.000000
6 2.0 2.0 0.571429
7 NaN 1.0 0.571429
8 1.0 1.0 0.000000
9 2.0 0.0 2.000000
Upvotes: 1