Reputation: 11
I'm trying to do one-hot encoding of the first column with following code:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(categories = 'auto',sparse = False)
X[:,0] = enc.fit_transform(X[:,0]).toarray()
But I get Error which says I have to reshape my data. How can I one-hot encode the first column and then add it to the whole data ?
Upvotes: 1
Views: 235
Reputation: 25189
Your problem is you're passing to OHE
a 1d array. Reshape is as a 2d and you're fine to go.
Proof
Suppose we happen to have some data that resembles yours:
np.random.seed(42)
X = np.c_[np.random.randint(0,3,10),np.random.randn(10),np.random.randn(10)]
X
array([[ 2. , 1.57921282, -1.01283112],
[ 0. , 0.76743473, 0.31424733],
[ 2. , -0.46947439, -0.90802408],
[ 2. , 0.54256004, -1.4123037 ],
[ 0. , -0.46341769, 1.46564877],
[ 0. , -0.46572975, -0.2257763 ],
[ 2. , 0.24196227, 0.0675282 ],
[ 1. , -1.91328024, -1.42474819],
[ 2. , -1.72491783, -0.54438272],
[ 2. , -0.56228753, 0.11092259]])
Then we can proceed as follows:
from sklearn.preprocessing import OneHotEncoder
oho = OneHotEncoder(sparse = False)
oho_enc = oho.fit_transform(X[:,0].reshape(-1,1)) # <--- you have a problem here
res = np.c_[oho_enc, X[:,1:]]
res
matrix([[ 0. , 0. , 1. , 1.57921282, -1.01283112],
[ 1. , 0. , 0. , 0.76743473, 0.31424733],
[ 0. , 0. , 1. , -0.46947439, -0.90802408],
[ 0. , 0. , 1. , 0.54256004, -1.4123037 ],
[ 1. , 0. , 0. , -0.46341769, 1.46564877],
[ 1. , 0. , 0. , -0.46572975, -0.2257763 ],
[ 0. , 0. , 1. , 0.24196227, 0.0675282 ],
[ 0. , 1. , 0. , -1.91328024, -1.42474819],
[ 0. , 0. , 1. , -1.72491783, -0.54438272],
[ 0. , 0. , 1. , -0.56228753, 0.11092259]])
Upvotes: 2