Harnold
Harnold

Reputation: 1

Using OneHotEncoder in making dummy variables

I'm trying to make dummy variables in my input set of the following form: My Input set

So I encoded the categorical data so now my array is of the form: Encoded input set

Next, I would like to make dummy variables using OneHot Encoder. I know that it used to work this way:

onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

But now the OneHotEncoder class works a bit different and I can't figure out how to adjust it to my dataset so it works exactly this way. My code:

import numpy as np
import pandas as pd

dataset = pd.DataFrame(
    {'RowNumber': [1, 2, 3, 4, 5],
     'CustomerId': [602, 311, 304, 354, 888],
     'Surname': ['Har', 'Hil', 'Oni', 'Bon', 'Mit'],
     'CreditScore': [619, 608, 502, 699, 850],
     'Geography': ['FR', 'ES', 'FR', 'FR', 'ES'],
     'Gender': ['F', 'F', 'F', 'F', 'F'],
     'Age': [42, 41, 42, 39, 43],
     'Tenure': [2, 1, 8, 0, 2]})

X = dataset.iloc[:, 3 : -1].values
y= dataset.iloc[:, -1].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 1] = le.fit_transform(X[:, 1])
X[:, 2] = le.fit_transform(X[:, 2])

# Making dummy variables
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()

Thank you in advance!

Upvotes: 0

Views: 905

Answers (2)

user1953384
user1953384

Reputation: 1059

It turns out the API for OneHotEncoder has changed, as it says in the documentation. Now you need to pass a list of categories instead of just the categories (in order to be able to generate multiple one-hot encodings in the same call, if needed).

Does the following work as you expect?

import numpy as np
import pandas as pd

dataset = pd.DataFrame(
    {'RowNumber': [1, 2, 3, 4, 5],
     'CustomerId': [602, 311, 304, 354, 888],
     'Surname': ['Har', 'Hil', 'Oni', 'Bon', 'Mit'],
     'CreditScore': [619, 608, 502, 699, 850],
     'Geography': ['FR', 'ES', 'FR', 'FR', 'ES'],
     'Gender': ['F', 'F', 'F', 'F', 'F'],
     'Age': [42, 41, 42, 39, 43],
     'Tenure': [2, 1, 8, 0, 2]})

X = dataset.iloc[:, 3 : -1].values
y= dataset.iloc[:, -1].values

# Making dummy variables
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
X1 = ohe.fit_transform(list(map(lambda x: [x], X[:, 1]))).toarray()
X2 = ohe.fit_transform(list(map(lambda x: [x], X[:, 2]))).toarray()

Upvotes: 1

ManojK
ManojK

Reputation: 1640

Use pandas.get_dummies() to create dummy variables for pandas dataframe:

df = pd.DataFrame({'Country':['France','Spain','Germany','France','Spain','Germany','Germany'],
                   'Gender':['Male','Female','Male','Female','Male','Male','Female'],
                   'Age':[52,30,38,45,41,55,29]})

df = pd.get_dummies(data = df, columns = ['Country','Gender'])

Upvotes: 0

Related Questions