user1581390
user1581390

Reputation: 1998

How to Vectorize This 2D Matrix Operation?

I have 2 arrays of boolean masks and I am looking to calculate an operation on every combination of two masks.

The slow version

N = 10000
M = 580
masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)

result = np.zeros(shape=(N,N), dtype=np.float)
for i in range(N):
    for j in range(N):
        result[i,j] = np.float64(np.count_nonzero(masksB[i,:]==masksB[j,:])) / M

The faster version

for i in range(N):
    result[i,:] = np.array(np.count_nonzero(masksB[i]==masksA, axis=1), dtype=np.float64) / M

Can this be done even faster with a single loop?

Upvotes: 3

Views: 111

Answers (2)

Divakar
Divakar

Reputation: 221504

That's basically outer-equality comparison keeping the first of those mask axes aligned. We can leverage matrix-multiplication with the idea that the matrix-multiplication of the masks themselves and their simulataneous negated versions would lead us to those outer equality-summations. Hence, for the summations of equalities, simply do -

out = (~masksB).astype(int).dot(~masksA.T) + masksB.astype(int).dot(masksA.T)

Alternatively, get the int version and use it to get the negated version too -

mB = masksB.astype(int)
out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)

The final output would be out/float(M). Or we can replace those int conversions with float ones and then divide the output by M.

Timings (with N as 1000 instead) to get equality summations

# Setup
In [39]: np.random.seed(0)
    ...: N = 1000
    ...: M = 580
    ...: masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
    ...: masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)

# Approach #1 with int dtype converison
In [40]: %timeit (~masksB).astype(int).dot(~masksA.T) + masksB.astype(int).dot(masksA.T)
1 loop, best of 3: 870 ms per loop

# Approach #1 with float dtype converison
In [41]: %timeit (~masksB).astype(float).dot(~masksA.T) + masksB.astype(float).dot(masksA.T)
10 loops, best of 3: 74 ms per loop

# Approach #2 with int dtype converison
In [42]: %%timeit
    ...: mB = masksB.astype(int)
    ...: out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)
1 loop, best of 3: 882 ms per loop

# Approach #2 with float dtype converison
In [43]: %%timeit
    ...: mB = masksB.astype(float)
    ...: out = (1-mB).dot(~masksA.T) + mB.dot(masksA.T)
10 loops, best of 3: 59.3 ms per loop

So, the preferred way would be to go with the float conversion on the second approach and divide the output by M.

Use one matrix-multiplication with stacking

Since the equality summations would essentially be int numbers, we can use a lower precision dtype for the matrix-multiplication. Also, one more idea would be to stack the original masks with their negated versions and then perform the matrix-multiplication. Hence, for equality summations, we would have -

m1 = np.hstack((masksA,~masksA))
m2 = np.hstack((masksB,~masksB))
out = m2.astype(np.float32).dot(m1.T)

Timings (same setup as earlier) -

In [49]: %%timeit
    ...: m1 = np.hstack((masksA,~masksA))
    ...: m2 = np.hstack((masksB,~masksB))
    ...: out = m2.astype(np.float32).dot(m1.T)
10 loops, best of 3: 36.8 ms per loop

Upvotes: 3

Paul Panzer
Paul Panzer

Reputation: 53029

We can effectively cut @Divakar's (current) best time in two by using only one matrix multiplication:

import numpy as np

N = 1000
M = 580
masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)

def f_pp():
    return np.where(masksB,1.0,-1.0)@np.where(masksA,0.5/M,-0.5/M).T+0.5

def f_pp_2():
    return (np.where(masksB,1.0,-1.0)@np.where(masksA,0.5,-0.5).T+0.5*M)*(1/M)

def f_dv_2():
    mB = masksB.astype(float)
    return ((1-mB).dot(~masksA.T) + mB.dot(masksA.T))*(1/M)

assert np.allclose(f_pp(),f_dv_2())
assert np.all(f_pp_2()==f_dv_2())

from timeit import timeit

print("PP     ",timeit(f_pp,number=100)*10,"ms")
print("PP 2   ",timeit(f_pp_2,number=100)*10,"ms")
print("Divakar",timeit(f_dv_2,number=100)*10,"ms")

Sample run:

PP      31.41063162125647 ms
PP 2    31.757128546014428 ms
Divakar 63.400797033682466 ms

How does it work? By mapping True and False to opposite numbers x and -x (a different x can be used for the two factors; let's say a for masksA and b for masksB) we get in the dot product a term ab for each position where the masks agree and a term -ab for each position where they are different. If k is the number of equal bits between a pair vectors then their dot product will be kab-(M-k)ab = 2kab - Mab. Choosing a and b such that 2Mab = 1 this becomes k/M - 1/2 which is up to a constant offset our desired result.

Upvotes: 2

Related Questions