kmace
kmace

Reputation: 2044

Untransform after OneHotEncoder

I'm using sklearn's OneHotEncoder, but want to untransform my data. any idea how to do that?

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])  
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

but I want to be able to do the following:

>>> enc.untransform(array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]]))
[[0, 1, 1]]

How would I go about doing this?

For context, I've built a neural network that learns the one-hot encoding space, and want to now use the nn to make real predictions that need to be in the original data format.

Upvotes: 0

Views: 1178

Answers (1)

bmjrowe
bmjrowe

Reputation: 336

For Inverting a single one hot encoded item
see: https://stackoverflow.com/a/39686443/7671913

from sklearn.preprocessing import OneHotEncoder
import numpy as np

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])

ohe = OneHotEncoder()
encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise

decoded = encoded.dot(ohe.active_features_).astype(int)
assert np.allclose(orig, decoded)

For Inverting an array of one hot coded items see (as stated in the comments)
see: How to reverse sklearn.OneHotEncoder transform to recover original data?

Given the sklearn.OneHotEncoder instance called ohc, the encoded data (scipy.sparse.csr_matrix) output from ohc.fit_transform or ohc.transform called out, and the shape of the original data (n_samples, n_feature), recover the original data X with:

recovered_X = np.array([ohc.active_features_[col] for col in out.sorted_indices().indices])
            .reshape(n_samples, n_features) - ohc.feature_indices_[:-1]

Upvotes: 1

Related Questions