Reputation: 119
I am trying to preprocess my data by replacing the missing value by the mean.
My code is as follows:
#Load the Data
import numpy as np
data_2 = np.genfromtxt('data.csv', delimiter=',', skip_header=1)
#the missing values in my dataset are identified by value = 0
#I'm trying to replace the missing values in the third column
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values=0, strategy='mean', axis=0)
imp.fit(data_2[:, 2])
it runs but gave these warnings:
/Users/user1/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
/Users/user1/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
but my main problem is that it did not fill the missing data, I printed the data before and after the fitting and no change.
What's the thing I'm doing wrong?
Update:
Here is few lines of my dataset:
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
Upvotes: 0
Views: 908
Reputation: 754
Consider this slightly updated version of your dataset to make you understand.
6,148,72,35,0,33.6,0.627,50,1
1,85,,29,0,26.6,0.351,,
,183,64,,0,,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
There is an easy way around filling missing values by using the library pandas
#Load Libraries and data
import pandas as pd
df = pd.read_csv('data.csv',names=[1,2,3,4,5,6,7,8,9])
#Fill the Null values with the mean
df = df.fillna(df.mean())
names argument in read_csv function is used to give name to the columns of the csv file
fillna() function will fill the missing values.
Upvotes: 1