milad cheraghzade
milad cheraghzade

Reputation: 11

one-hot encoding of some integers in sci-kit library

enter image description here

I'm trying to do one-hot encoding of the first column with following code:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(categories = 'auto',sparse = False)
X[:,0] = enc.fit_transform(X[:,0]).toarray()

But I get Error which says I have to reshape my data. How can I one-hot encode the first column and then add it to the whole data ?

Upvotes: 1

Views: 235

Answers (1)

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25189

Your problem is you're passing to OHE a 1d array. Reshape is as a 2d and you're fine to go.

Proof

Suppose we happen to have some data that resembles yours:

np.random.seed(42)
X = np.c_[np.random.randint(0,3,10),np.random.randn(10),np.random.randn(10)]
X
array([[ 2.        ,  1.57921282, -1.01283112],
       [ 0.        ,  0.76743473,  0.31424733],
       [ 2.        , -0.46947439, -0.90802408],
       [ 2.        ,  0.54256004, -1.4123037 ],
       [ 0.        , -0.46341769,  1.46564877],
       [ 0.        , -0.46572975, -0.2257763 ],
       [ 2.        ,  0.24196227,  0.0675282 ],
       [ 1.        , -1.91328024, -1.42474819],
       [ 2.        , -1.72491783, -0.54438272],
       [ 2.        , -0.56228753,  0.11092259]])

Then we can proceed as follows:

from sklearn.preprocessing import OneHotEncoder
oho = OneHotEncoder(sparse = False)
oho_enc = oho.fit_transform(X[:,0].reshape(-1,1)) # <--- you have a problem here
res = np.c_[oho_enc, X[:,1:]]
res
matrix([[ 0.        ,  0.        ,  1.        ,  1.57921282, -1.01283112],
        [ 1.        ,  0.        ,  0.        ,  0.76743473,  0.31424733],
        [ 0.        ,  0.        ,  1.        , -0.46947439, -0.90802408],
        [ 0.        ,  0.        ,  1.        ,  0.54256004, -1.4123037 ],
        [ 1.        ,  0.        ,  0.        , -0.46341769,  1.46564877],
        [ 1.        ,  0.        ,  0.        , -0.46572975, -0.2257763 ],
        [ 0.        ,  0.        ,  1.        ,  0.24196227,  0.0675282 ],
        [ 0.        ,  1.        ,  0.        , -1.91328024, -1.42474819],
        [ 0.        ,  0.        ,  1.        , -1.72491783, -0.54438272],
        [ 0.        ,  0.        ,  1.        , -0.56228753,  0.11092259]])

Upvotes: 2

Related Questions