Reputation: 2232
I have following numpy array:
array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 2, 3, 3, 3, 3, 1, 1, 1, 1,
1, 3, 1, 1, 3, 0, 1, 3, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 0, 1, 2, 0, 2,
2, 2, 1, 2, 2, 0, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 0, 2, 1, 1,
1, 1, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 3,
2, 1, 1, 1, 1, 3, 1, 0])
Question: How can I create another array that encodes the data, given condition: If value = 3 or 2, then "1", else "0".
I tried:
from sklearn.preprocessing import label_binarize
label_binarize(doc_topics, classes=[3,2])[:15]
array([[0, 0],
[0, 0],
[0, 0],
[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 1],
[0, 0],
[0, 0],
[0, 0],
[0, 1]])
However, this seems to return a 2-D array.
Upvotes: 1
Views: 96
Reputation: 394071
Use np.where
and pass your condition to mask the elements of interest to set where the condition is met to 1
, 0
otherwise:
In[18]:
a = np.where((a==3) | (a == 2),1,0)
a
Out[18]:
array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0])
Here we compare the array with the desired values, and use the unary |
to or
the conditions, due to operator precedence we have to use parentheses ()
around the conditions.
To do this using sklearn:
In[68]:
binarizer = preprocessing.Binarizer(threshold=1)
binarizer.transform(a.reshape(1,-1))
Out[68]:
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0]])
This treats values above 1
as 1
and 0
otherwise, this only works for this specific data set as you want 2
and 3
to be 1
, it won't work if you have other values, so the numpy
method is more general
Upvotes: 1