SKLearn: Dummy Variables for Label Encoded Categorical Values

Question

I begin by setting my X from the excel dataset and converting it into matrix values:

X = dataset.iloc[:, 3:13].values

So I have two columns for X I need to label encode (countries and gender). There are three countries, Spain, France, and Germany, and there are only two genders. I label encode them:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1]) # the three countries
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

Okay, now I need create dummy variable for the three countries, since they don't exist in a hierarchical relationship with one value higher than other. However, the new code doesn't work:

onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

This code does not work. I read that ColumnTransformer with Onehotencoding is used now to create dummy variables, but I am having difficulty figuring it out. I did import necessary packages. I tried this, but it still does not work:

columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = columnTransformer.fit_transform(X)

Can someone help? Thanks. I just want to hot encode the three countries in the beginning after they are label encoded.

SKLearn: Dummy Variables for Label Encoded Categorical Values

Answers (1)

Related Questions