Reputation: 11
Summary: I am looking for a way to do computations with sparse matrices whose non-zero entries are not the usual integers/floats/etc., but elements of an algebra, ie instances of a non-standard python class with addition, multiplication and a zero element.
It works fine for dense matrices. I have implemented this algebra by defining a python class algebra
and overloading addition and multiplication:
class algebra(object):
...
__mul__(self,other):
...
__add__(self,other):
...
numpy
allows me to define vectors and matrices whose entries are instances of the class algebra
. It also allows me to perform all the usual operations like matrix multiplication/addition/tensordot/slicing/etc., so it is all working just as for matrices over integers/floats/etc.
It does not work for sparse matrices.
To speed up computations, I would now like to replace these dense matrices by sparse ones. I have tried to make this work with SciPy's 2-D sparse matrix package scipy.sparse
, but I have failed so far. I can populate instances of these sparse matrix classes by my algebra elements, but whenever I do computations with them, I get an error message like
TypeError: no supported conversion for types: (dtype('O'),dtype('O'))
To me, this suggests that there is a restriction on the type of objects that are supported by scipy.sparse
. I do not see any mathematical reason for why the operations for sparse matrices should care about the object type. As long as the class has all the operations of floats, say, it should work. What am I missing? Is there an alternative to scipy.sparse
which supports arbitrary object types?
Below is a minimal working example. Note that I have implemented the zero element of the algebra in terms of the usual integer 0. Please also note that the actual algebra I am interested in is more complicated than the real integers!
import numpy as np
from scipy.sparse import csr_matrix
class algebra(object): # the algebra of the real integers
def __init__(self,num):
self.num = num
def __add__(self,other):
if isinstance(other, self.__class__):
return algebra(self.num+other.num)
else:
return self
def __radd__(self,other):
if isinstance(other, self.__class__):
return algebra(self.num+other.num)
else:
return self
def __mul__(self,other):
if isinstance(other, self.__class__):
return algebra(self.num*other.num)
else:
return 0
def __rmul__(self,other):
if isinstance(other, self.__class__):
return algebra(self.num*other.num)
else:
return 0
def __repr__(self):
return "algebra:"+str(self.num)
a=algebra(5)
print(a*a)
print(a*0)
print(0*a)
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([a,a,a,a,a,a])
S = csr_matrix((data, indices, indptr), shape=(3, 3))
print(S)
print("Everything works fine up to here.")
S*S
The output is:
algebra:25
0
0
(0, 0) algebra:5
(0, 2) algebra:5
(1, 2) algebra:5
(2, 0) algebra:5
(2, 1) algebra:5
(2, 2) algebra:5
Everything works fine up to here.
Traceback (most recent call last):
File "test", line 46, in <module>
S*S
File "/usr/lib/python3/dist-packages/scipy/sparse/base.py", line 319, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/lib/python3/dist-packages/scipy/sparse/compressed.py", line 499, in _mul_sparse_matrix
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
File "/usr/lib/python3/dist-packages/scipy/sparse/sputils.py", line 57, in upcast
raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('O'), dtype('O'))
I am using Python 3.5.2 on linux.
Upvotes: 1
Views: 556
Reputation: 231375
This may fall more in the comment category, but as an answer I can make it longer, and edit it more.
numpy
arrays implement object dtype by storing pointers/references to the objects in the array's data buffer. Math is done by delegating the task to object methods. The iteration is essentially at Python speeds, comparable to list comprehension (may be even a bit slower). numpy
does not do its fast compiled math on these objects.
scipy.sparse
has not developed this kind of functionality. A coo
format matrix can probably be created with the object inputs - but that's because it doesn't do much. In fact if the data
, row
and col
inputs have the right numpy
array setup, they are uses as coo
attributes without change.
Apparently making csr
as you do with the indptr
etc also just assigns the attributes. A coo
to csr
conversion might not work so well, since that involves summation of duplicates.
In any case csr
math code uses a mix of python and c (cython), and the compiled part works with a limited number of numeric types - long and double integers and floats. I don't think it even works for short ints (int8
, int16
). It does not implement any of the object dtype delegating that ndarrays
do.
With your S
:
In [187]: S.A
...
ValueError: unsupported data types in input
In [188]: S.tocoo()
Out[188]:
<3x3 sparse matrix of type '<class 'numpy.object_'>'
with 6 stored elements in COOrdinate format>
no value changes are required for tocoo
. But back to csr
requires summing duplicates:
In [189]: S.tocoo().tocsr()
...
TypeError: no supported conversion for types: (dtype('O'),)
In [190]: S.tolil()
/usr/local/lib/python3.6/dist-packages/scipy/sparse/sputils.py:115: UserWarning: object dtype is not supported by sparse matrices
warnings.warn("object dtype is not supported by sparse matrices")
Out[190]:
<3x3 sparse matrix of type '<class 'numpy.object_'>'
with 6 stored elements in LInked List format>
There's no problem in storing this object data
Math with a list of your objects versus an array - similar times:
In [192]: alist = [a]*100
In [193]: arr = np.array(alist)
In [194]: timeit [i*j for i,j in zip(alist,alist)]
77.9 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [195]: timeit arr*arr
75.1 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
An earlier question, which you may have already seen (I just got an upvote), about using int16
in sparse matrices. Same basic issue:
Why can't I assign data to part of sparse matrix in the first "try:"?
The symbolics library has a sparse matrix module: https://docs.sympy.org/latest/modules/matrices/sparse.html
Pandas has its own sparse Series/Dataframe implementations
By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)
Upvotes: 1