Reputation: 702
from sklearn.preprocessing import LabelEncoder
l_labels = ['[PAD]'] + ['NN', 'ADJ', 'PRON']
le = LabelEncoder()
le.fit(l_labels)
le.trasform('[PAD]')
>>>> 3
I want the encodind of '[PAD]' to be 0. Is it possible to bind a label to an encoding with LabelEncoder ?
Upvotes: 1
Views: 168
Reputation: 1402
the scikit learn LabelEncoder is sorting the list of element before the transformation one way to encode 'PAD' to be 0 is the change the name of PAD to some thing that will be sorted as first.
l_labels = ['0' + 'PAD'] + ['NN', 'ADJ', 'PRON']
le = LabelEncoder()
le.fit(l_labels)
le.transform(['0'+'PAD'])
>> [0]
Upvotes: 1
Reputation: 16916
No, you cannot do that in LabelEncoder
because it first finds the unique elements and then sorts them to assign numerical encoding.
fit
method.uniques_set = set(values)
uniques_set, missing_values = _extract_missing(uniques_set)
uniques = sorted(uniques_set)
Upvotes: 1