Python sklearn OneHotEncoding categorical and sometimes repeated values

Question

This is my problem with sklearn's OneHotEncoder. with an array a = [1,2,3,4,5,6,7,8,9,22] i.e ALL UNIQUE of a.shape=[10,1] (after reshape(-1,1), a [10,10] matrix of OneHotEncoded values is returned.

array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.]])

But with an array like a = [1,2,2,4,4,6,7,8,9,22] i.e NON UNIQUE of a.shape=[10,1] (after reshape(-1,1), a [10,8] matrix of OneHotEncoded values is returned.

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

But I cannot use this as my input placeholder expects a [10,10] matrix as input. Can anyone help me handle non-unique values in sklearn's OneHotEncoder?

P.S Adding the parameter n_values= 10 gives an error saying ValueError: Feature out of bounds for n_values=10

Python sklearn OneHotEncoding categorical and sometimes repeated values

Answers (1)

Related Questions