Darkmoor
Darkmoor

Reputation: 902

Efficient pairwise comparison between an row and column array

Given a row vector a = np.array([1, 2, 3]) and a column vector b = np.array([[1], [2], [3]]) we can compare all elements one by one by executing c = a==b which returns

>>> c
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

However, when the number of elements is very large this demands a lot of memory. Is it possible use the sparse matrices a and b below and compute a sparse c matrix efficiently?

from scipy.sparse import csr_matrix
data = np.array([1, 2, 3])
row = np.array([0, 1, 2])
col = np.array([0, 0, 0])

a = csr_matrix((data, (row, col)), shape=(3, 1))
b = csr_matrix((data, (col, row)), shape=(1, 3))

Upvotes: 1

Views: 166

Answers (1)

dermen
dermen

Reputation: 5372

To speed up a==b you can use numexpr.evaluate('a==b'), however this wont eliminate the memory burden.

Instead, you can store the indices of where a==b is True:

In [5]: import numexpr

In [6]: import numpy as np

In [7]: a = np.array([1, 2, 3,4])

In [8]: b = np.array([[1], [3], [5]])

In [9]: np.where(numexpr.evaluate('a==b'))  # this consumes the memory
Out[9]: (array([0, 1]), array([0, 2]))  # note, this is rows, cols

In [10]: for col,aval in enumerate(a):  # this will be a lighter memory burdern
    ...:     rows = np.where(aval==b)[0]
    ...:     if not rows.size:
    ...:         continue
    ...:     for row in rows:
    ...:         print (row, col)
    ...: 
0 0
1 2

For clarity:

In [14]: a==b
Out[14]: 
array([[ True, False, False, False],
       [False, False,  True, False],
       [False, False, False, False]])

Upvotes: 2

Related Questions