Christopher
Christopher

Reputation: 2232

Numpy array: Conditional encoding

I have following numpy array:

array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 2, 3, 3, 3, 3, 1, 1, 1, 1,
       1, 3, 1, 1, 3, 0, 1, 3, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 0, 1, 2, 0, 2,
       2, 2, 1, 2, 2, 0, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 0, 2, 1, 1,
       1, 1, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 3,
       2, 1, 1, 1, 1, 3, 1, 0])

Question: How can I create another array that encodes the data, given condition: If value = 3 or 2, then "1", else "0".

I tried:

from sklearn.preprocessing import label_binarize
label_binarize(doc_topics, classes=[3,2])[:15]

array([[0, 0],
       [0, 0],
       [0, 0],
       [1, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 1],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 1]])

However, this seems to return a 2-D array.

Upvotes: 1

Views: 96

Answers (1)

EdChum
EdChum

Reputation: 394071

Use np.where and pass your condition to mask the elements of interest to set where the condition is met to 1, 0 otherwise:

In[18]:
a = np.where((a==3) | (a == 2),1,0)
a

Out[18]: 
array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0])

Here we compare the array with the desired values, and use the unary | to or the conditions, due to operator precedence we have to use parentheses () around the conditions.

To do this using sklearn:

In[68]:
binarizer = preprocessing.Binarizer(threshold=1)
binarizer.transform(a.reshape(1,-1))

Out[68]: 
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
        0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
        0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
        1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
        0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0]])

This treats values above 1 as 1 and 0 otherwise, this only works for this specific data set as you want 2 and 3 to be 1, it won't work if you have other values, so the numpy method is more general

Upvotes: 1

Related Questions