juuso
juuso

Reputation: 702

bind a label to a given encoding with sklearn LabelEncoder

from sklearn.preprocessing import LabelEncoder
l_labels = ['[PAD]'] + ['NN', 'ADJ', 'PRON'] 
le = LabelEncoder()
le.fit(l_labels)
le.trasform('[PAD]')

>>>> 3

I want the encodind of '[PAD]' to be 0. Is it possible to bind a label to an encoding with LabelEncoder ?

Upvotes: 1

Views: 168

Answers (2)

Ghassen Sultana
Ghassen Sultana

Reputation: 1402

the scikit learn LabelEncoder is sorting the list of element before the transformation one way to encode 'PAD' to be 0 is the change the name of PAD to some thing that will be sorted as first.

l_labels = ['0' + 'PAD'] + ['NN', 'ADJ', 'PRON'] 
le = LabelEncoder()
le.fit(l_labels)
le.transform(['0'+'PAD'])
>> [0]

Upvotes: 1

mujjiga
mujjiga

Reputation: 16916

No, you cannot do that in LabelEncoder because it first finds the unique elements and then sorts them to assign numerical encoding.

what happens internally in the fit method.

uniques_set = set(values)
uniques_set, missing_values = _extract_missing(uniques_set)

uniques = sorted(uniques_set)

Ref: https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/utils/_encode.py#L135

Upvotes: 1

Related Questions