how can i replace mean instead of missing values in python

Question

In the code below, i'm trying to replace mean instead of missing values but i can't get a result for my attempts because this data includes special characters which is "?". When there is no question marks in the data this code works data.fillna(data.mean()). When i tried to impute method, i got the following error:

ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float:

Also this data includes string columns with missing values, how can i fix missing values in the string columns (column rbc for example)?

here is my data: https://easyupload.io/te2mbc

path = ("C:\Users\bbb\Desktop\ccc\group5data.txt")
names = ["age","bp","sg","al","su","rbc","pc","pcc","ba",
         "bgr","bu","sc","sod","pot","hemo","pcv","wc",
         "rc","htn","dm","cad","appet","pe","ane","class"]
data = pd.read_csv(path, names=names)```

joao · Accepted Answer

The fact that you have '?' characters in columns 'sod' and 'pot' make pandas parse those columns as strings, so even if you do

df.replace('?', np.nan)

the column will have both (float) NaNs and strings, so pandas won't be able to calculate a mean() for it. I believe this is what causes your ValueError.

So try converting those columns to float (not int, because np.nan is float):

df = pd.read_csv('C:/a/sw/group5data.txt', error_bad_lines=False, names=names)
df = df.replace('?', np.nan)
df.loc[:, 'sod':'pot'] = df.loc[:, 'sod':'pot'].astype(float)
df = df.fillna(df.mean())

The NaNs have now been replaced with each columns's mean (of course, the columns are now float, instead of the original int, but that should be easy to fix).

how can i replace mean instead of missing values in python

Answers (2)

Select numeric columns.

Fill numeric columns with mean.

Related Questions