tunaantastan
tunaantastan

Reputation: 13

how can i replace mean instead of missing values in python

In the code below, i'm trying to replace mean instead of missing values but i can't get a result for my attempts because this data includes special characters which is "?". When there is no question marks in the data this code works data.fillna(data.mean()). When i tried to impute method, i got the following error:

ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float:

Also this data includes string columns with missing values, how can i fix missing values in the string columns (column rbc for example)?

here is my data: https://easyupload.io/te2mbc

path = ("C:\\Users\\bbb\\Desktop\\ccc\\group5data.txt")
names = ["age","bp","sg","al","su","rbc","pc","pcc","ba",
         "bgr","bu","sc","sod","pot","hemo","pcv","wc",
         "rc","htn","dm","cad","appet","pe","ane","class"]
data = pd.read_csv(path, names=names)```

Upvotes: 1

Views: 847

Answers (2)

Bhavani Ravi
Bhavani Ravi

Reputation: 2291

Your data consists of both numerical and non numeric columns, inorder to fillna with mean you need to select just the numerical columns

Select numeric columns.

data = data.select_dtypes('number')

Fill numeric columns with mean.

data[data.columns] = data.fillna(a.mean())

Upvotes: 0

joao
joao

Reputation: 2293

The fact that you have '?' characters in columns 'sod' and 'pot' make pandas parse those columns as strings, so even if you do

df.replace('?', np.nan)

the column will have both (float) NaNs and strings, so pandas won't be able to calculate a mean() for it. I believe this is what causes your ValueError.

So try converting those columns to float (not int, because np.nan is float):

df = pd.read_csv('C:/a/sw/group5data.txt', error_bad_lines=False, names=names)
df = df.replace('?', np.nan)
df.loc[:, 'sod':'pot'] = df.loc[:, 'sod':'pot'].astype(float)
df = df.fillna(df.mean())

The NaNs have now been replaced with each columns's mean (of course, the columns are now float, instead of the original int, but that should be easy to fix).

Upvotes: 1

Related Questions