Reputation: 3539
I need to convert one-hot encoding to categories represented by unique integers. So one-hot encoding created with the following code:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
for x in [1,2,3]:
print(enc.transform([[x]]).toarray())
Out:
[[ 1. 0. 0.]]
[[ 0. 1. 0.]]
[[ 0. 0. 1.]]
Could be converted back to a set of unique integers, for example:
[1,2,3] or [11,37, 45] or any other where each integer uniquely represents a single class.
Is it possible to do with scikit-learn or any other python lib?
* Update *
Tried to:
labels = [[1],[2],[3], [4], [5],[6],[7]]
enc.fit(labels)
lst = []
for x in [1,2,3,4,5,6,7]:
lst.append(enc.transform([[x]]).toarray())
lst
Out:
[array([[ 1., 0., 0., 0., 0., 0., 0.]]),
array([[ 0., 1., 0., 0., 0., 0., 0.]]),
array([[ 0., 0., 1., 0., 0., 0., 0.]]),
array([[ 0., 0., 0., 1., 0., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 0., 1., 0.]]),
array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a = np.array(lst)
np.where(a==1)[1]
Out:
array([0, 0, 0, 0, 0, 0, 0], dtype=int64)
Not what I need
Upvotes: 6
Views: 13364
Reputation: 564
Since you are using sklearn.preprocessing.OneHotEncoder
to 'encode' the data, you can use its .inverse_transform()
method to 'decode' the data (I think this requires .__version__ = 0.20.1
or newer):
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
encoder = enc.fit(labels)
encoded_labels = encoder.transform(labels)
decoded_labels = encoder.inverse_transform(encoded_labels)
decoded_labels # array([[1],
[2],
[3]])
n.b. decoded_labels is a numpy array not a list.
Upvotes: 0
Reputation: 2629
You can use np.argmax()
:
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
x = enc.transform(labels).toarray()
# x = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
xr = (np.argmax(x, axis=1)+1).reshape(-1, 1)
print(xr)
This should return array([[1], [2], [3]])
. If you want instead array([[0], [1], [2]])
, just remove the +1
in the definition of xr
.
Upvotes: 2
Reputation: 19634
You can do that using np.where
as follows:
import numpy as np
a=np.array([[ 0., 1., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.]])
np.where(a==1)[1]
This prints array([1, 0, 2], dtype=int64)
. This works since np.where(a==1)[1]
returns the column indices of the 1
's, which are exactly the labels.
In addition, since a
is a 0,1
-matrix, you can also replace np.where(a==1)[1]
with just np.where(a)[1]
.
Update: The following solution should work with your format:
l=[np.array([[ 1., 0., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 1., 0., 0., 0., 0.]]),
np.array([[ 0., 1., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 1., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a=np.array(l)
np.where(a)[2]
This prints
array([0, 2, 1, 4, 4, 5, 6], dtype=int64)
Alternativaly, you could use the original solution together with @ml4294's comment.
Upvotes: 8