Reputation: 682
I am new in Machine Learning, I am trying to fix the missing data of a dataset in Spyder IDE so I want to use sklearn.preprocessing.Imputer library in python of scikit-learn. So I am getting an error over here that TypeError: unhashable type: 'slice'
.
# IMPORTING THE LIBRARY
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# IMPORTING THE DATASET
dataset = pd.read_csv('file:///C:/Users/Bhaskar Das/Desktop/Udemy/Machine_Learning_AZ_Template_Folder/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/Section 2 -------------------- Part 1 - Data Preprocessing --------------------/Data.csv')
x = dataset.iloc[:,:].values
# =============================================================================
# Out[4]:
# Country Age Salary Purchased
# 0 France 44.0 72000.0 No
# 1 Spain 27.0 48000.0 Yes
# 2 Germany 30.0 54000.0 No
# 3 Spain 38.0 61000.0 No
# 4 Germany 40.0 NaN Yes
# 5 France 35.0 58000.0 Yes
# 6 Spain NaN 52000.0 No
# 7 France 48.0 79000.0 Yes
# 8 Germany 50.0 83000.0 No
# 9 France 37.0 67000.0 Yes
# =============================================================================
# TAKING CARE OF MISSING DATA
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = '0')
imputer = imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])
I am showing the exact code line which is generating the below error given in the output.
Line from the above code which is throwing the error.
imputer = imputer.fit(x[:, 1:3])
OUTPUT:
runfile('C:/Users/Bhaskar Das/.spyder-py3/DataPreprocessing-ML.py', wdir='C:/Users/Bhaskar Das/.spyder-py3')
Traceback (most recent call last):
File "<ipython-input-19-2ed93817316a>", line 1, in <module>
runfile('C:/Users/Bhaskar Das/.spyder-py3/DataPreprocessing-ML.py', wdir='C:/Users/Bhaskar Das/.spyder-py3')
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Bhaskar Das/.spyder-py3/DataPreprocessing-ML.py", line 142, in <module>
imputer = imputer.fit(x[:, 1:3])
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1840, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type: 'slice'
Upvotes: 0
Views: 805
Reputation: 1
from sklearn.preprocessing import imputation
imputer = Imputer(missing_values='Nan', strategy='mean', axis=0)
imputer= imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])
C:\Users\Pravesh\Anaconda3\lib\site-packages\sklearn\utils\deprecation.py:66: DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version 0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead. warnings.warn(msg, category=DeprecationWarning) Traceback (most recent call last):
File "", line 3, in imputer= imputer.fit(x[:, 1:3])
File "C:\Users\Pravesh\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2980, in getitem indexer = self.columns.get_loc(key)
File "C:\Users\Pravesh\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 109, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(None, None, None), slice(1, 3, None))' is an invalid key
I am getting this as an error while using the same code
Upvotes: 0
Reputation: 1150
Why oh why do people insist on making their lives more difficult by extracting arrays from dataframes? sklearn plays well with pandas!!
Once you solve the error that @ely has shown, another error is because you are writing:
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = '0')
when I think you mean to write:
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
In the documentation, it says:
axis : integer, optional (default=0)
The axis along which to impute.
If axis=0, then impute along columns. If axis=1, then impute along rows.
Upvotes: 1
Reputation: 77424
Your variable x
is a pandas DataFrame with column labels. As a result, the syntax you use,
x[:, 1:3]
is invalid for slicing along the column dimension, and this is what generates the error message.
So somewhere there is another bug or you have not shown the exact code you are using, since otherwise the operation
x = dataset.iloc[:,:].values
should return a numpy ndarray that would not result in this error message.
This is corroborated by the traceback you showed in your question, which says:
File "C:/Users/Bhaskar Das/.spyder-py3/DataPreprocessing-ML.py", line 142, in <module>
imputer = imputer.fit(x[:, 1:3])
File "C:\Users\Bhaskar Das\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
among other things. The object being sliced inside the imputer
call is definitely a pandas DataFrame, which means the code pasted in the question is likely missing something from your actual code.
Note also that you can simply fill in the missing value directly with pandas:
dataset.fillna(
{'Age': dataset.Age.mean(), 'Salary': dataset.Salary.mean()},
inplace=True
)
Upvotes: 2