ValueError: bad input shape (2835, 18)

i am new in Data science and I want to make a classification from categorical data. I wish to do before using K-means algorithms but i got this 'error ValueError: bad input shape (2835, 18)' when i use fit_transform() and i don't know how fix it. I hope that someone could help me.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

#load my data
myData = pd.read_excel('panelForOneHot.xlsx')
myData = myData.dropna()
myData.reset_index(drop = True, inplace = True)
myData

values = np.array(myData)
print(values)

#integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)

Upvotes: 1

Views: 83

Answers (1)

codeblooded
codeblooded

Reputation: 350

LabelEncoder() expects one dimensional data. Pass a specific field to be encoded as shown below.

# Import label encoder 
from sklearn import preprocessing 

# label_encoder object knows how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 

# Encode labels in column 'species'. 
df['species']= label_encoder.fit_transform(df['species']) 

df['species'].unique() 

If you intend to encode all columns,

df.apply(LabelEncoder().fit_transform)

And if you intend to encode multiple columns but not all,

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import OneHotEncoder

categorical_columns = ['country', 'gender']
numerical_columns = ['age']
column_trans = make_column_transformer(
    (categorical_columns, OneHotEncoder(handle_unknown='ignore'),
    (numerical_columns, RobustScaler())
column_trans.fit_transform(df)

Upvotes: 1

Related Questions