Reputation: 94
I have a dataset Data.csv
Country,Age,Salary,Purchased
France,44,72000,No
Spain,27,48000,Yes
Germany,30,54000,No
Spain,38,61000,No
Germany,40,,Yes
France,35,58000,Yes
Spain,,52000,No
France,48,79000,Yes
Germany,50,83000,No
France,37,67000,Yes
I tried to fill nan values using sklearn.impute.SimpleImputer by using following code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
# Taking care of missing data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = 'NaN', strategy = 'mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
But I get a error which says:
File "C:\Users\Krishna Rohith\Machine Learning A-Z\Part 1 - Data Preprocessing\Section 2 ----------- --------- Part 1 - Data Preprocessing --------------------\missing_data.py", line 16, in <module>
imputer = imputer.fit(X[:, 1:3])
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 268, in fit
X = self._validate_input(X)
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 242, in _validate_input
raise ve
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\impute\_base.py", line 235, in _validate_input
force_all_finite=force_all_finite, copy=self.copy)
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 562, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Users\Krishna Rohith\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I know how to do it numpy but can someone please tell me using sklearn.impute?
Upvotes: 1
Views: 2678
Reputation: 26
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
Replace 'NaN' by numpy default Nan np.nan
Upvotes: 1