Reputation: 77
i am new in Data science and I want to make a classification from categorical data. I wish to do before using K-means algorithms but i got this 'error ValueError: bad input shape (2835, 18)' when i use fit_transform() and i don't know how fix it. I hope that someone could help me.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
#load my data
myData = pd.read_excel('panelForOneHot.xlsx')
myData = myData.dropna()
myData.reset_index(drop = True, inplace = True)
myData
values = np.array(myData)
print(values)
#integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
Upvotes: 1
Views: 83
Reputation: 350
LabelEncoder() expects one dimensional data. Pass a specific field to be encoded as shown below.
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'species'.
df['species']= label_encoder.fit_transform(df['species'])
df['species'].unique()
If you intend to encode all columns,
df.apply(LabelEncoder().fit_transform)
And if you intend to encode multiple columns but not all,
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import OneHotEncoder
categorical_columns = ['country', 'gender']
numerical_columns = ['age']
column_trans = make_column_transformer(
(categorical_columns, OneHotEncoder(handle_unknown='ignore'),
(numerical_columns, RobustScaler())
column_trans.fit_transform(df)
Upvotes: 1