Reputation: 1
I'm trying to make dummy variables in my input set of the following form: My Input set
So I encoded the categorical data so now my array is of the form: Encoded input set
Next, I would like to make dummy variables using OneHot Encoder. I know that it used to work this way:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
But now the OneHotEncoder class works a bit different and I can't figure out how to adjust it to my dataset so it works exactly this way. My code:
import numpy as np
import pandas as pd
dataset = pd.DataFrame(
{'RowNumber': [1, 2, 3, 4, 5],
'CustomerId': [602, 311, 304, 354, 888],
'Surname': ['Har', 'Hil', 'Oni', 'Bon', 'Mit'],
'CreditScore': [619, 608, 502, 699, 850],
'Geography': ['FR', 'ES', 'FR', 'FR', 'ES'],
'Gender': ['F', 'F', 'F', 'F', 'F'],
'Age': [42, 41, 42, 39, 43],
'Tenure': [2, 1, 8, 0, 2]})
X = dataset.iloc[:, 3 : -1].values
y= dataset.iloc[:, -1].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 1] = le.fit_transform(X[:, 1])
X[:, 2] = le.fit_transform(X[:, 2])
# Making dummy variables
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
Thank you in advance!
Upvotes: 0
Views: 905
Reputation: 1059
It turns out the API for OneHotEncoder
has changed, as it says in the documentation. Now you need to pass a list
of categories instead of just the categories (in order to be able to generate multiple one-hot encodings in the same call, if needed).
Does the following work as you expect?
import numpy as np
import pandas as pd
dataset = pd.DataFrame(
{'RowNumber': [1, 2, 3, 4, 5],
'CustomerId': [602, 311, 304, 354, 888],
'Surname': ['Har', 'Hil', 'Oni', 'Bon', 'Mit'],
'CreditScore': [619, 608, 502, 699, 850],
'Geography': ['FR', 'ES', 'FR', 'FR', 'ES'],
'Gender': ['F', 'F', 'F', 'F', 'F'],
'Age': [42, 41, 42, 39, 43],
'Tenure': [2, 1, 8, 0, 2]})
X = dataset.iloc[:, 3 : -1].values
y= dataset.iloc[:, -1].values
# Making dummy variables
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
X1 = ohe.fit_transform(list(map(lambda x: [x], X[:, 1]))).toarray()
X2 = ohe.fit_transform(list(map(lambda x: [x], X[:, 2]))).toarray()
Upvotes: 1
Reputation: 1640
Use pandas.get_dummies()
to create dummy variables for pandas dataframe:
df = pd.DataFrame({'Country':['France','Spain','Germany','France','Spain','Germany','Germany'],
'Gender':['Male','Female','Male','Female','Male','Male','Female'],
'Age':[52,30,38,45,41,55,29]})
df = pd.get_dummies(data = df, columns = ['Country','Gender'])
Upvotes: 0