Anhad Singh Arora
Anhad Singh Arora

Reputation: 21

Encoding Ordinal Values in Python

I'm trying to encode Ordinal Categorical Values in the 3rd column of my dataset where "Tiny Mongra" has the lowest value and "1st Wand" has the highest values. It is synonymous to using small, medium and large sizes where the current dataset denotes the size of a grain of rice.

I keep getting the following error when I run this snippet:

Traceback (most recent call last):

  File "<ipython-input-1-ae4501cc0ac1>", line 19, in <module>
    X[:, 2] = ordinalencoder_X_3.fit_transform(X[:, 2])

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
    return self.fit(X, **fit_params).transform(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 794, in fit
    self._fit(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 61, in _fit
    X = self._check_X(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 47, in _check_X
    X_temp = check_array(X, dtype=None)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 552, in check_array
    "if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead:
array=['1st Wand' '1st Wand' '1st Wand' ... '1st Wand' '1st Wand' '1st Wand'].

On further inspection, I have found that the error was not warning me about the list of categorical data, but was referring to the column that I wanted encoded. For some reason it considers that column a 1D array of the form:

array=['1st Wand' '1st Wand' '1st Wand' '1st Wand' '1st Wand' 'Dubar' '2nd Wand'
 'Tibar' 'Mongra' '1st Wand' '1st Wand' '1st Wand' '1st Wand' '1st Wand'
 '1st Wand' '2nd Wand' 'Super Dubar' 'Super Tibar' ... '1st Wand' '1st Wand'].

This is weird since I am using LabelEncoder to fit_transform other categorical values in my dataset and they work fine.

Here is a link to the Data. See "Data" sheet:

https://docs.google.com/spreadsheets/d/12nAU5QztVnVroRYDsRDsZGUyBpBTwAD5yMmbMaAxnHQ/edit?usp=sharing

Here is the full code. Refer to the last part:

import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Ryze Price NN Data.csv')
X = dataset.iloc[:, 1:7].values
y = dataset.iloc[:, 7].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])

labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])

# SEE THIS PART
category_array = ["Tiny Mongra","Mini Mongra","Mongra","Super Mongra","Mini Dubar","Dubar","Super Dubar","Mini Tibar","Tibar","Super Tibar","2nd Wand","Super 2nd Wand","1st Wand"]
ordinalencoder_X_3 = OrdinalEncoder(categories=category_array)
X[:, 2] = ordinalencoder_X_3.fit_transform(np.array(X[:,2])

I expect that categorical data is encoded as follows: "Tiny Mongra" should be encoded as 0 . . "1st Wand" should be encoded as 12

Upvotes: 2

Views: 10282

Answers (4)

Sanjiban Mohanty
Sanjiban Mohanty

Reputation: 1

Try using below.

ordinalencoder_X = OrdinalEncoder() X[:, 0:3] = ordinalencoder_X.fit_transform(pd.DataFrame(X.iloc[:, 0:3]))

Upvotes: 0

Sanjiban Mohanty
Sanjiban Mohanty

Reputation: 1

Try using below;

ordinalencoder_X = OrdinalEncoder() X[:, 0:3] = ordinalencoder_X.fit_transform([X[:, 0:3]])

Upvotes: 0

Aditya Kansal
Aditya Kansal

Reputation: 171

Instead of using Ordinal Encoder, One other option is to use Pandas Applymap function and pass the mapping dictionary using Lambda Function.

Here is the mapping dictionary:

mapping = { "Tiny Mongra" : 0,"Mini Mongra" : 1,"Mongra":2,"Super Mongra" : 3,"Mini 
Dubar":4,"Dubar":5,"Super Dubar":6,"Mini Tibar":7,"Tibar":8,"Super Tibar":9,"2nd 
Wand":10,"Super 2nd Wand" :11,"1st Wand":12}

Lets say below is my dataframe:

df = pd.DataFrame(['Tiny Mongra', 'Mini Dubar' ,'Mongra', '1st Wand' ,'1st Wand' 
,'Dubar' ,'2nd Wand','Tibar', 'Mongra', 'Super Dubar', '1st Wand', '1st Wand', '1st 
 Wand' ,'1st Wand','1st Wand', '2nd Wand' ,'Super Dubar' ,'Super Tibar' ,'1st Wand', 
'1st Wand'],   columns = ['category'])

Then you can create another encoded mapping column using below code:

df['mapped_category'] = df.applymap(lambda x : mapping[x])

Upvotes: 2

Rafa&#243;
Rafa&#243;

Reputation: 599

The main distinction between LabelEncoder and OrdinalEncoder is their purpose:

  • LabelEncoder should be used for target variables,
  • OrdinalEncoder should be used for feature variables.

In general they work the same, but:

  • LabelEncoder needs y: array-like of shape [n_samples],
  • OrdinalEncoder needs X: array-like, shape [n_samples, n_features].

If you just want to encode your categorical variable's values to 0, 1, ..., n, use LabelEncoder the same way you did with X1 and X2.

labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_3.fit_transform(X[:, 2])

But I would transform all three variables with OrdinalEncoder at the same time:

ordinalencoder_X = OrdinalEncoder()
X[:, 0:3] = ordinalencoder_X.fit_transform(X[:, 0:3])

Upvotes: 5

Related Questions