Efficiently compute pairwise equal for NumPy arrays

Question

Given two NumPy arrays, say:

import numpy as np
import numpy.random as rand

n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))

Is there a more efficient way to compute X in the following:

X = np.zeros((n, n))
for i in range(n):
    for j in range(n):
        X[i, j] = 1 * np.all(x[i] == y[j])

Divakar · Accepted Answer

Approach #1 : Input arrays with 0s & 1s

For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -

n = x.shape[1]        
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)

Approach #2 : Generic case

For a generic case, we can use views -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)

Runtime test

# Setup
In [287]: np.random.seed(0)
     ...: n = 1000
     ...: x = rand.binomial(n=1, p=.5, size=(n, 10))
     ...: y = rand.binomial(n=1, p=.5, size=(n, 10))

# Original approach
In [288]: %%timeit
     ...: X = np.zeros((n, n))
     ...: for i in range(n):
     ...:     for j in range(n):
     ...:         X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop

# Approach #1
In [290]: %%timeit
     ...: n = x.shape[1]        
     ...: s = 2**np.arange(n)
     ...: x1D = x.dot(s)
     ...: y1D = y.dot(s)
     ...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop

# Approach #2
In [291]: %%timeit
     ...: x1D, y1D = view1D(x, y)
     ...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop

Efficiently compute pairwise equal for NumPy arrays

Answers (1)

Related Questions