Prance
Prance

Reputation: 13

Specifying the columns using strings is only supported for pandas DataFrames

I want to One-hot-encoding several columns and used several solutions include simple one-hot-encoding, ColumnTransformer, make_column_transformer, Pipeline, and get_dummies but anytime I have got different errors.

x = dataset.iloc[:, :11].values
y = dataset.iloc[:, 11].values


""" data encoding """

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


# oe = OrdinalEncoder()
# x = oe.fit_transform(x)

non_cat = ["Make", "Model", "Vehicle", "Transmission", "Fuel"]

onehot_cat = ColumnTransformer([
    ("categorical", OrdinalEncoder(), non_cat),
    ("onehot_categorical", OneHotEncoder(), non_cat)],
    remainder= "passthrough")
x = onehot_cat.fit_transform(x)

error:

[['ACURA' 'ILX' 'COMPACT' ... 6.7 8.5 33]
['ACURA' 'ILX' 'COMPACT' ... 7.7 9.6 29]
['ACURA' 'ILX HYBRID' 'COMPACT' ... 5.8 5.9 48]
...
['VOLVO' 'XC60 T6 AWD' 'SUV - SMALL' ... 8.6 10.3 27]
['VOLVO' 'XC90 T5 AWD' 'SUV - STANDARD' ... 8.3 9.9 29]
['VOLVO' 'XC90 T6 AWD' 'SUV - STANDARD' ... 8.7 10.7 26]]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
424         try:
--> 425             all_columns = X.columns
426         except AttributeError:

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-4008371c305f> in <module>
 24     ("onehot_categorical", OneHotEncoder(), non_cat)],
 25     remainder= "passthrough")
 ---> 26 x = onehot_cat.fit_transform(x)
 27 
 28 print('OneHotEncode = ', x.shape)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
527         self._validate_transformers()
528         self._validate_column_callables(X)
--> 529         self._validate_remainder(X)
530 
531         result = self._fit_transform(X, y, _fit_transform_one)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_remainder(self, X)
325         cols = []
326         for columns in self._columns:
--> 327             cols.extend(_get_column_indices(X, columns))
328 
329         remaining_idx = sorted(set(range(self._n_features)) - set(cols))

~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
425             all_columns = X.columns
426         except AttributeError:
--> 427             raise ValueError("Specifying the columns using strings is only "
428                              "supported for pandas DataFrames")
429         if isinstance(key, str):

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

Upvotes: 1

Views: 3971

Answers (1)

Carlos Ferreira
Carlos Ferreira

Reputation: 2078

I got a similar error trying to make prediction using a model. It was expecting a dataframe but I was sending a numpy object instead. So I changed it from:

prediction = monitor_model.predict(s_df.to_numpy())

to:

prediction = monitor_model.predict(s_df)

Upvotes: 1

Related Questions