Reputation: 13
In the code below, i'm trying to replace mean instead of missing values but i can't get a result for my attempts because this data includes special characters which is "?". When there is no question marks in the data this code works data.fillna(data.mean())
. When i tried to impute method, i got the following error:
ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float:
Also this data includes string columns with missing values, how can i fix missing values in the string columns (column rbc for example)?
here is my data: https://easyupload.io/te2mbc
path = ("C:\\Users\\bbb\\Desktop\\ccc\\group5data.txt")
names = ["age","bp","sg","al","su","rbc","pc","pcc","ba",
"bgr","bu","sc","sod","pot","hemo","pcv","wc",
"rc","htn","dm","cad","appet","pe","ane","class"]
data = pd.read_csv(path, names=names)```
Upvotes: 1
Views: 847
Reputation: 2291
Your data consists of both numerical and non numeric columns, inorder to fillna
with mean you need to select just the numerical columns
data = data.select_dtypes('number')
data[data.columns] = data.fillna(a.mean())
Upvotes: 0
Reputation: 2293
The fact that you have '?' characters in columns 'sod' and 'pot' make pandas parse those columns as strings, so even if you do
df.replace('?', np.nan)
the column will have both (float) NaNs and strings, so pandas won't be able to calculate a mean() for it. I believe this is what causes your ValueError.
So try converting those columns to float (not int, because np.nan is float):
df = pd.read_csv('C:/a/sw/group5data.txt', error_bad_lines=False, names=names)
df = df.replace('?', np.nan)
df.loc[:, 'sod':'pot'] = df.loc[:, 'sod':'pot'].astype(float)
df = df.fillna(df.mean())
The NaNs have now been replaced with each columns's mean (of course, the columns are now float, instead of the original int, but that should be easy to fix).
Upvotes: 1