Reputation: 103
I am wondering what the best way is to write code that can deal with sparse and dense matrices. The following example shows my concern:
input:
A = np.asmatrix(np.ones((2,3)))
B = sps.csr_matrix(A)
x = np.ones(3)
print(A.dot(x).shape, B.dot(x).shape)
output:
(1, 2) (2,)
The first result seems wrong from a mathematical perspective. Feeding the result to a function that expects a vector gives an error. I could cast the output into an array and squeeze it:
print(np.asarray(A.dot(x)).squeeze().shape)
(2,)
Is this the best solution?
Upvotes: 0
Views: 43
Reputation: 231335
In [135]: A.dot(x)
Out[135]: matrix([[3., 3.]])
In [136]: np.ones((2,3)).dot(x)
Out[136]: array([3., 3.])
np.matrix
is a subclass of ndarray
that must be 2d, and tries, as much as possible, to return np.matrix
(2d) results. Hence Out[135]
is 2d (1,2).
Out[136]
on the other hand is a (2,3) dot (3,) => (2,). But it is worth keeping mind that dot
has some special rules when one or other of the inputs is 1d.
Pure np.matrix
dot:
In [161]: A.dot(np.matrix(x).T) # A*np.matrix(x).T
Out[161]:
matrix([[3.],
[3.]])
While sparse
matrix is modeled on np.matrix
, it is its own class, and behaves differently in many cases. dot
of sparse and dense can be tricky.
In [152]: B.shape
Out[152]: (2, 3)
In [153]: B.dot(x)
Out[153]: array([3., 3.])
Effectively the B
has been turned into an array first:
In [154]: B.A.dot(x) # B.toarray()
Out[154]: array([3., 3.])
But this is wrong, because it attempts to use np.array(B)
instead of the proper B.toarray()
:
In [155]: np.dot(B,x)
Out[155]:
array([<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>,
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>,
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>], dtype=object)
In [156]: np.array(B)
Out[156]:
array(<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>, dtype=object)
If both variables are sparse, then the result is also sparse. In this case the special dot
handling of 1d arrays does not apply. A (2,3) has to multiply a (3,1), producing a (2,1):
In [158]: B*sparse.csr_matrix(x).T
Out[158]:
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [159]: _.A
Out[159]:
array([[3.],
[3.]])
In sum, mixing matrix types can be confusing. It would be cleanest to convert everything to ndarray
first. But it really depends on what kind of output you want.
Upvotes: 1