Reputation: 1998
I have 2 arrays of boolean masks and I am looking to calculate an operation on every combination of two masks.
The slow version
N = 10000
M = 580
masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
result = np.zeros(shape=(N,N), dtype=np.float)
for i in range(N):
for j in range(N):
result[i,j] = np.float64(np.count_nonzero(masksB[i,:]==masksB[j,:])) / M
The faster version
for i in range(N):
result[i,:] = np.array(np.count_nonzero(masksB[i]==masksA, axis=1), dtype=np.float64) / M
Can this be done even faster with a single loop?
Upvotes: 3
Views: 111
Reputation: 221504
That's basically outer-equality comparison keeping the first of those mask axes aligned. We can leverage matrix-multiplication
with the idea that the matrix-multiplication of the masks themselves and their simulataneous negated versions would lead us to those outer equality-summations. Hence, for the summations of equalities, simply do -
out = (~masksB).astype(int).dot(~masksA.T) + masksB.astype(int).dot(masksA.T)
Alternatively, get the int
version and use it to get the negated version too -
mB = masksB.astype(int)
out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)
The final output would be out/float(M)
. Or we can replace those int
conversions with float
ones and then divide the output by M
.
Timings (with N as 1000 instead) to get equality summations
# Setup
In [39]: np.random.seed(0)
...: N = 1000
...: M = 580
...: masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
...: masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
# Approach #1 with int dtype converison
In [40]: %timeit (~masksB).astype(int).dot(~masksA.T) + masksB.astype(int).dot(masksA.T)
1 loop, best of 3: 870 ms per loop
# Approach #1 with float dtype converison
In [41]: %timeit (~masksB).astype(float).dot(~masksA.T) + masksB.astype(float).dot(masksA.T)
10 loops, best of 3: 74 ms per loop
# Approach #2 with int dtype converison
In [42]: %%timeit
...: mB = masksB.astype(int)
...: out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)
1 loop, best of 3: 882 ms per loop
# Approach #2 with float dtype converison
In [43]: %%timeit
...: mB = masksB.astype(float)
...: out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)
10 loops, best of 3: 59.3 ms per loop
So, the preferred way would be to go with the float conversion on the second approach and divide the output by M
.
Use one matrix-multiplication with stacking
Since the equality summations would essentially be int
numbers, we can use a lower precision dtype for the matrix-multiplication. Also, one more idea would be to stack the original masks with their negated versions and then perform the matrix-multiplication. Hence, for equality summations, we would have -
m1 = np.hstack((masksA,~masksA))
m2 = np.hstack((masksB,~masksB))
out = m2.astype(np.float32).dot(m1.T)
Timings (same setup as earlier) -
In [49]: %%timeit
...: m1 = np.hstack((masksA,~masksA))
...: m2 = np.hstack((masksB,~masksB))
...: out = m2.astype(np.float32).dot(m1.T)
10 loops, best of 3: 36.8 ms per loop
Upvotes: 3
Reputation: 53029
We can effectively cut @Divakar's (current) best time in two by using only one matrix multiplication:
import numpy as np
N = 1000
M = 580
masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
def f_pp():
return np.where(masksB,1.0,-1.0)@np.where(masksA,0.5/M,-0.5/M).T+0.5
def f_pp_2():
return (np.where(masksB,1.0,-1.0)@np.where(masksA,0.5,-0.5).T+0.5*M)*(1/M)
def f_dv_2():
mB = masksB.astype(float)
return ((1-mB).dot(~masksA.T) + mB.dot(masksA.T))*(1/M)
assert np.allclose(f_pp(),f_dv_2())
assert np.all(f_pp_2()==f_dv_2())
from timeit import timeit
print("PP ",timeit(f_pp,number=100)*10,"ms")
print("PP 2 ",timeit(f_pp_2,number=100)*10,"ms")
print("Divakar",timeit(f_dv_2,number=100)*10,"ms")
Sample run:
PP 31.41063162125647 ms
PP 2 31.757128546014428 ms
Divakar 63.400797033682466 ms
How does it work? By mapping True and False to opposite numbers x
and -x
(a different x
can be used for the two factors; let's say a
for masksA and b
for masksB) we get in the dot product a term ab
for each position where the masks agree and a term -ab
for each position where they are different. If k
is the number of equal bits between a pair vectors then their dot product will be kab-(M-k)ab = 2kab - Mab
. Choosing a
and b
such that 2Mab = 1
this becomes k/M - 1/2
which is up to a constant offset our desired result.
Upvotes: 2