aura
aura

Reputation: 543

vectorized approach to change 1 dimensional boolean array to 2-dimensional, python 3

I have a vector of boolean with dimension 1 * n, suppose n = 6.

vec = [1, 0, 1, 0, 0, 1]

I want to change it to a n * 2 matrix. For each element in vec, if it is 1, then in the matrix the corresponding row should be [1, 0]; if it is 0, then the corresponding row should be [0, 1]. So the resulting matrix should be

matr = [[1, 0],
        [0, 1],
        [1, 0],
        [0, 1],
        [0, 1],
        [1, 0]]

To convert the vector to matrix, I need an elegant vectorized approach (avoiding for-loops), since in real case n would be much larger than 6. The reason for this conversion is for machine learning classification purppose. The vec refers to binary classification, and the matr will be used for categorical classification. Maybe this information can make my question more specific.

I use Python 3, numpy/scipy, sklearn.

Can anyone help me with it? Thanks.

Upvotes: 2

Views: 71

Answers (2)

Divakar
Divakar

Reputation: 221504

Here's one approach with array-indexing. Basically, we would use a 2D array with two sub-arrays for 0 and 1 mapping from vec. For the indexing part, np.take is very efficient for such repeated indices. The implementation would look something like this -

mapping = np.array([[0,1],[1,0]])
out = np.take(mapping, vec, axis=0)

Sample run -

In [115]: vec = np.array([1, 0, 1, 0, 0, 1])

In [116]: np.take(np.array([[0,1],[1,0]]), vec, axis=0)
Out[116]: 
array([[1, 0],
       [0, 1],
       [1, 0],
       [0, 1],
       [0, 1],
       [1, 0]])

Runtime test on bigger dataset -

In [108]: vec = np.random.randint(0,2,(10000000))

# @Jon Clements's soln
In [109]: %timeit np.stack((vec, vec ^ 1), axis=1)
10 loops, best of 3: 50.2 ms per loop

# @Warren Weckesser's suggestion soln
In [110]: %timeit vec[:,None] ^ [0, 1]
10 loops, best of 3: 90 ms per loop

# Proposed in this post
In [111]: %timeit np.take(np.array([[0,1],[1,0]]), vec, axis=0)
10 loops, best of 3: 31 ms per loop

Upvotes: 3

Jon Clements
Jon Clements

Reputation: 142106

Assuming that vec is a numpy.array:

vec = np.array([1, 0, 1, 0, 0, 1])

You can then stack it column wise with itself bitwise XOR'd to flip the values from 0->1 and 1->0, eg:

out = np.stack((vec, vec ^ 1), axis=1)

Gives you:

array([[1, 0],
       [0, 1],
       [1, 0],
       [0, 1],
       [0, 1],
       [1, 0]])

Thanks to Warren Weckesser for suggesting a faster broadcasting approach in a comment:

vec[:,None] ^ [0, 1]

Basic timings:

In [33]: %timeit np.stack((a, a ^ 1), axis=1)
15.6 µs ± 199 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [34]: %timeit a[:,None] ^ [0, 1]
7.4 µs ± 45.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Upvotes: 4

Related Questions