QueryQuasar
QueryQuasar

Reputation: 171

How to One Hot Encode a Dataframe Column in Python?

I'm trying to convert a column Dataframe with One Hot Encoder with this code.

from sklearn.preprocessing import OneHotEncoder
df['label'] = OneHotEncoder().fit(df['label']).toarray()

This is the traceback

ValueError: Expected 2D array, got 1D array instead:
  array=['Label1' 'Label1' 'Label1' 'Label1' 'Label1'
 'Label1' 'Label1' 'Label1' 'Label1' 'Label1'
  'Label2' 'Label2' 'Label2' 'Label2' 'Label2' 'Label2' 'Label2'
 'Label2' 'Label2' 'Label2' 'Label3' 'Label3' 'Label3' 'Label3' 'Label3' 'Label3'
 'Label3' 'Label3' 'Label3' 'Label3' 'Label4' 'Label4' 'Label4' 'Label4' 'Label4' 'Label4'
 'Label4' 'Label4' 'Label4' 'Label4' 'Label5' 'Label5' 'Label5'
 'Label5' 'Label5' 'Label5' 'Label5' 'Label5'
 'Label5' 'Label5' 'Label6' 'Label6' 'Label6'
 'Label6' 'Label6' 'Label6' 'Label6' 'Label6'
 'Label6' 'Label6' 'Label7' 'Label7' 'Label7'
 'Label7' 'Label7' 'Label7' 'Label7' 'Label7'
 'Label7' 'Label7' 'Label8' 'Label8' 'Label8' 'Label8' 'Label8'
 'Label8' 'Label8' 'Label8' 'Label8' 'Label8' 'Label9' 'Label9'
 'Label9' 'Label9' 'Label9' 'Label9' 'Label9' 'Label9'
 'Label9' 'Label9' 'Label10' 'Label10' 'Label10' 'Label10' 'Label10'
 'Label10' 'Label10' 'Label10' 'Label10' 'Label10' 'Label11' 'Label11'
 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11' 'Label11'
 'Label12' 'Label12' 'Label12' 'Label12' 'Label12' 'Label12'
 'Label12' 'Label12' 'Label12' 'Label12'].
  Reshape your data either using array.reshape(-1, 1) if your data has a single feature or 
  array.reshape(1, -1) if it contains a single sample.

I already tried to reshape but the traceback is that a series has no attribute reshape. What is a workaround to use One Hot Encoder?

Upvotes: 2

Views: 7416

Answers (3)

msbeigi
msbeigi

Reputation: 349

import pandas as pd

encoder = OneHotEncoder(drop='first') 
onehot = encoder.fit_transform(df[['label']])
feature_name = encoder.categories_[0]
onehot_df = pd.DataFrame(onehot.toarray(), columns=feature_name[1:])
df_enumeric= pd.concat([df_enumeric, onehot_df], axis=1)
df_X_enumeric.drop('label', axis=1, inplace=True)

Upvotes: 0

roman_ka
roman_ka

Reputation: 598

There is a specific function in pandas for it called get_dummies link

pd.get_dummies(df['Label'])

Upvotes: 1

user11989081
user11989081

Reputation: 8663

See below, but note that you cannot assign the results of the OneHotEncoder to a single data frame column. I suspect that you are looking for the LabelEncoder instead.

OneHotEncoder

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({
    'label': ['Label1', 'Label4', 'Label2', 'Label2', 'Label1', 'Label3', 'Label3']
})

X = df['label'].values.reshape(-1, 1)
enc = OneHotEncoder().fit(X)

X = enc.transform(X).toarray()
print(X)
# [[1. 0. 0. 0.]
#  [0. 0. 0. 1.]
#  [0. 1. 0. 0.]
#  [0. 1. 0. 0.]
#  [1. 0. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 1. 0.]]

X = enc.inverse_transform(X)
print(X)
# [['Label1']
#  ['Label4']
#  ['Label2']
#  ['Label2']
#  ['Label1']
#  ['Label3']
#  ['Label3']]

LabelEncoder

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({
    'label': ['Label1', 'Label4', 'Label2', 'Label2', 'Label1', 'Label3', 'Label3']
})

y = df['label'].values
enc = LabelEncoder().fit(y)

y = enc.transform(y)
print(y)
# [0 3 1 1 0 2 2]

y = enc.inverse_transform(y)
print(y)
# ['Label1' 'Label4' 'Label2' 'Label2' 'Label1' 'Label3' 'Label3']

Upvotes: 1

Related Questions