Reputation: 543
I have a vector of boolean with dimension 1 * n
, suppose n = 6
.
vec = [1, 0, 1, 0, 0, 1]
I want to change it to a n * 2
matrix. For each element in vec
, if it is 1
, then in the matrix the corresponding row should be [1, 0]
; if it is 0
, then the corresponding row should be [0, 1]
. So the resulting matrix should be
matr = [[1, 0],
[0, 1],
[1, 0],
[0, 1],
[0, 1],
[1, 0]]
To convert the vector to matrix, I need an elegant vectorized approach (avoiding for-loops
), since in real case n
would be much larger than 6
.
The reason for this conversion is for machine learning classification purppose. The vec
refers to binary classification
, and the matr will be used for categorical classification
. Maybe this information can make my question more specific.
I use Python 3, numpy/scipy, sklearn.
Can anyone help me with it? Thanks.
Upvotes: 2
Views: 71
Reputation: 221504
Here's one approach with array-indexing. Basically, we would use a 2D
array with two sub-arrays for 0
and 1
mapping from vec
. For the indexing part, np.take
is very efficient for such repeated indices. The implementation would look something like this -
mapping = np.array([[0,1],[1,0]])
out = np.take(mapping, vec, axis=0)
Sample run -
In [115]: vec = np.array([1, 0, 1, 0, 0, 1])
In [116]: np.take(np.array([[0,1],[1,0]]), vec, axis=0)
Out[116]:
array([[1, 0],
[0, 1],
[1, 0],
[0, 1],
[0, 1],
[1, 0]])
Runtime test on bigger dataset -
In [108]: vec = np.random.randint(0,2,(10000000))
# @Jon Clements's soln
In [109]: %timeit np.stack((vec, vec ^ 1), axis=1)
10 loops, best of 3: 50.2 ms per loop
# @Warren Weckesser's suggestion soln
In [110]: %timeit vec[:,None] ^ [0, 1]
10 loops, best of 3: 90 ms per loop
# Proposed in this post
In [111]: %timeit np.take(np.array([[0,1],[1,0]]), vec, axis=0)
10 loops, best of 3: 31 ms per loop
Upvotes: 3
Reputation: 142106
Assuming that vec
is a numpy.array
:
vec = np.array([1, 0, 1, 0, 0, 1])
You can then stack it column wise with itself bitwise XOR'd to flip the values from 0->1 and 1->0, eg:
out = np.stack((vec, vec ^ 1), axis=1)
Gives you:
array([[1, 0],
[0, 1],
[1, 0],
[0, 1],
[0, 1],
[1, 0]])
Thanks to Warren Weckesser for suggesting a faster broadcasting approach in a comment:
vec[:,None] ^ [0, 1]
Basic timings:
In [33]: %timeit np.stack((a, a ^ 1), axis=1)
15.6 µs ± 199 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [34]: %timeit a[:,None] ^ [0, 1]
7.4 µs ± 45.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 4