Reputation: 21
I'm trying to encode Ordinal Categorical Values in the 3rd column of my dataset where "Tiny Mongra" has the lowest value and "1st Wand" has the highest values. It is synonymous to using small, medium and large sizes where the current dataset denotes the size of a grain of rice.
I keep getting the following error when I run this snippet:
Traceback (most recent call last):
File "<ipython-input-1-ae4501cc0ac1>", line 19, in <module>
X[:, 2] = ordinalencoder_X_3.fit_transform(X[:, 2])
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 794, in fit
self._fit(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 61, in _fit
X = self._check_X(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 47, in _check_X
X_temp = check_array(X, dtype=None)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 552, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=['1st Wand' '1st Wand' '1st Wand' ... '1st Wand' '1st Wand' '1st Wand'].
On further inspection, I have found that the error was not warning me about the list of categorical data, but was referring to the column that I wanted encoded. For some reason it considers that column a 1D array of the form:
array=['1st Wand' '1st Wand' '1st Wand' '1st Wand' '1st Wand' 'Dubar' '2nd Wand'
'Tibar' 'Mongra' '1st Wand' '1st Wand' '1st Wand' '1st Wand' '1st Wand'
'1st Wand' '2nd Wand' 'Super Dubar' 'Super Tibar' ... '1st Wand' '1st Wand'].
This is weird since I am using LabelEncoder to fit_transform other categorical values in my dataset and they work fine.
Here is a link to the Data. See "Data" sheet:
https://docs.google.com/spreadsheets/d/12nAU5QztVnVroRYDsRDsZGUyBpBTwAD5yMmbMaAxnHQ/edit?usp=sharing
Here is the full code. Refer to the last part:
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Ryze Price NN Data.csv')
X = dataset.iloc[:, 1:7].values
y = dataset.iloc[:, 7].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
# SEE THIS PART
category_array = ["Tiny Mongra","Mini Mongra","Mongra","Super Mongra","Mini Dubar","Dubar","Super Dubar","Mini Tibar","Tibar","Super Tibar","2nd Wand","Super 2nd Wand","1st Wand"]
ordinalencoder_X_3 = OrdinalEncoder(categories=category_array)
X[:, 2] = ordinalencoder_X_3.fit_transform(np.array(X[:,2])
I expect that categorical data is encoded as follows: "Tiny Mongra" should be encoded as 0 . . "1st Wand" should be encoded as 12
Upvotes: 2
Views: 10282
Reputation: 1
Try using below.
ordinalencoder_X = OrdinalEncoder() X[:, 0:3] = ordinalencoder_X.fit_transform(pd.DataFrame(X.iloc[:, 0:3]))
Upvotes: 0
Reputation: 1
Try using below;
ordinalencoder_X = OrdinalEncoder() X[:, 0:3] = ordinalencoder_X.fit_transform([X[:, 0:3]])
Upvotes: 0
Reputation: 171
Instead of using Ordinal Encoder, One other option is to use Pandas Applymap function and pass the mapping dictionary using Lambda Function.
Here is the mapping dictionary:
mapping = { "Tiny Mongra" : 0,"Mini Mongra" : 1,"Mongra":2,"Super Mongra" : 3,"Mini
Dubar":4,"Dubar":5,"Super Dubar":6,"Mini Tibar":7,"Tibar":8,"Super Tibar":9,"2nd
Wand":10,"Super 2nd Wand" :11,"1st Wand":12}
Lets say below is my dataframe:
df = pd.DataFrame(['Tiny Mongra', 'Mini Dubar' ,'Mongra', '1st Wand' ,'1st Wand'
,'Dubar' ,'2nd Wand','Tibar', 'Mongra', 'Super Dubar', '1st Wand', '1st Wand', '1st
Wand' ,'1st Wand','1st Wand', '2nd Wand' ,'Super Dubar' ,'Super Tibar' ,'1st Wand',
'1st Wand'], columns = ['category'])
Then you can create another encoded mapping column using below code:
df['mapped_category'] = df.applymap(lambda x : mapping[x])
Upvotes: 2
Reputation: 599
The main distinction between LabelEncoder
and OrdinalEncoder
is their purpose:
LabelEncoder
should be used for target variables,OrdinalEncoder
should be used for feature variables.In general they work the same, but:
LabelEncoder
needs y: array-like of shape [n_samples], OrdinalEncoder
needs X: array-like, shape [n_samples, n_features].If you just want to encode your categorical variable's values to 0, 1, ..., n
, use LabelEncoder
the same way you did with X1 and X2.
labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_3.fit_transform(X[:, 2])
But I would transform all three variables with OrdinalEncoder
at the same time:
ordinalencoder_X = OrdinalEncoder()
X[:, 0:3] = ordinalencoder_X.fit_transform(X[:, 0:3])
Upvotes: 5