Reputation: 2719
In Python, I want to multiply a binary matrix (each element is 0 or 1) with a binary vector. The size of the matrix is something like 100000 x 100 and the vector has 100 elements. If V and x are respectively the matrix and the vector (initialized as bool), this simple code can do the job:
V.astype(np.int8).dot(x.astype(np.int8))
My question is: is there a way to do it faster by taking advantage of the very binary nature of the 2 operands? After all, each of the 100000 operations is the sum of the logical AND between one row of V and the vector x.
Thank you for any help. Patrick
EDIT: Just tried your solution unutbu. Strangely enough, the performance for the 2 methods (int8 et einsum) seems to be similar on my machine.
Upvotes: 1
Views: 150
Reputation: 880419
You could use np.einsum:
In [12]: V = np.random.randint(2, size=(100000,100))
In [14]: x = np.random.randint(2, size=100)
In [15]: expected = V.astype(np.int8).dot(x.astype(np.int8))
In [28]: result = np.einsum('ij,j->i', V, x)
The result is the same:
In [29]: np.allclose(expected, result)
Out[29]: True
np.einsum
is about 3x faster:
In [30]: %timeit np.einsum('ij,j->i', V, x)
100 loops, best of 3: 6.92 ms per loop
In [25]: %timeit V.astype(np.int8).dot(x.astype(np.int8))
10 loops, best of 3: 22.4 ms per loop
Note: Unlike your original code, np.einsum
returns an array of dtype int32
in this case.
You could try to use np.logical_and
and np.sum
, but doing so is much slower:
In [45]: result2 = np.logical_and(V, x).sum(axis=1)
In [46]: np.allclose(expected, result2)
Out[46]: True
In [47]: %timeit np.logical_and(V, x).sum(axis=1)
10 loops, best of 3: 78 ms per loop
This might be slower because doing the calculation in two steps
requires the formation of an intermediate array np.logical_and(V, x)
of shape (100000, 100). Calling np.einsum
lets the underlying C++ function compute the result directly.
Upvotes: 1