Reputation: 902
Given a row vector a = np.array([1, 2, 3])
and a column vector b = np.array([[1], [2], [3]])
we can compare all elements one by one by executing c = a==b
which returns
>>> c
array([[ True, False, False],
[False, True, False],
[False, False, True]])
However, when the number of elements is very large this demands a lot of memory. Is it possible use the sparse matrices a
and b
below and compute a sparse c
matrix efficiently?
from scipy.sparse import csr_matrix
data = np.array([1, 2, 3])
row = np.array([0, 1, 2])
col = np.array([0, 0, 0])
a = csr_matrix((data, (row, col)), shape=(3, 1))
b = csr_matrix((data, (col, row)), shape=(1, 3))
Upvotes: 1
Views: 166
Reputation: 5372
To speed up a==b you can use numexpr.evaluate('a==b')
, however this wont eliminate the memory burden.
Instead, you can store the indices of where a==b is True:
In [5]: import numexpr
In [6]: import numpy as np
In [7]: a = np.array([1, 2, 3,4])
In [8]: b = np.array([[1], [3], [5]])
In [9]: np.where(numexpr.evaluate('a==b')) # this consumes the memory
Out[9]: (array([0, 1]), array([0, 2])) # note, this is rows, cols
In [10]: for col,aval in enumerate(a): # this will be a lighter memory burdern
...: rows = np.where(aval==b)[0]
...: if not rows.size:
...: continue
...: for row in rows:
...: print (row, col)
...:
0 0
1 2
For clarity:
In [14]: a==b
Out[14]:
array([[ True, False, False, False],
[False, False, True, False],
[False, False, False, False]])
Upvotes: 2