Claudius Zibrowius
Claudius Zibrowius

Reputation: 11

Is there a python library for sparse matrix operations for non-standard algebra-like objects?

Summary: I am looking for a way to do computations with sparse matrices whose non-zero entries are not the usual integers/floats/etc., but elements of an algebra, ie instances of a non-standard python class with addition, multiplication and a zero element.

It works fine for dense matrices. I have implemented this algebra by defining a python class algebra and overloading addition and multiplication:

class algebra(object):
   ...
   __mul__(self,other):
      ...
   __add__(self,other):
      ...

numpy allows me to define vectors and matrices whose entries are instances of the class algebra. It also allows me to perform all the usual operations like matrix multiplication/addition/tensordot/slicing/etc., so it is all working just as for matrices over integers/floats/etc.

It does not work for sparse matrices. To speed up computations, I would now like to replace these dense matrices by sparse ones. I have tried to make this work with SciPy's 2-D sparse matrix package scipy.sparse, but I have failed so far. I can populate instances of these sparse matrix classes by my algebra elements, but whenever I do computations with them, I get an error message like

TypeError: no supported conversion for types: (dtype('O'),dtype('O'))

To me, this suggests that there is a restriction on the type of objects that are supported by scipy.sparse. I do not see any mathematical reason for why the operations for sparse matrices should care about the object type. As long as the class has all the operations of floats, say, it should work. What am I missing? Is there an alternative to scipy.sparse which supports arbitrary object types?


Below is a minimal working example. Note that I have implemented the zero element of the algebra in terms of the usual integer 0. Please also note that the actual algebra I am interested in is more complicated than the real integers!

import numpy as np
from scipy.sparse import csr_matrix

class algebra(object): # the algebra of the real integers

    def __init__(self,num):
        self.num = num

    def __add__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num+other.num)
        else:
            return self

    def __radd__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num+other.num)
        else:
            return self

    def __mul__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num*other.num)
        else:
            return 0

    def __rmul__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num*other.num)
        else:
            return 0

    def __repr__(self):
        return "algebra:"+str(self.num)  

a=algebra(5)
print(a*a)
print(a*0)
print(0*a)
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([a,a,a,a,a,a])
S = csr_matrix((data, indices, indptr), shape=(3, 3))
print(S)
print("Everything works fine up to here.")
S*S    

The output is:

algebra:25
0
0
  (0, 0)    algebra:5
  (0, 2)    algebra:5
  (1, 2)    algebra:5
  (2, 0)    algebra:5
  (2, 1)    algebra:5
  (2, 2)    algebra:5
Everything works fine up to here.
Traceback (most recent call last):
  File "test", line 46, in <module>
    S*S    
  File "/usr/lib/python3/dist-packages/scipy/sparse/base.py", line 319, in __mul__
    return self._mul_sparse_matrix(other)
  File "/usr/lib/python3/dist-packages/scipy/sparse/compressed.py", line 499, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
  File "/usr/lib/python3/dist-packages/scipy/sparse/sputils.py", line 57, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('O'), dtype('O'))

I am using Python 3.5.2 on linux.

Upvotes: 1

Views: 556

Answers (1)

hpaulj
hpaulj

Reputation: 231375

This may fall more in the comment category, but as an answer I can make it longer, and edit it more.

numpy arrays implement object dtype by storing pointers/references to the objects in the array's data buffer. Math is done by delegating the task to object methods. The iteration is essentially at Python speeds, comparable to list comprehension (may be even a bit slower). numpy does not do its fast compiled math on these objects.

scipy.sparse has not developed this kind of functionality. A coo format matrix can probably be created with the object inputs - but that's because it doesn't do much. In fact if the data, row and col inputs have the right numpy array setup, they are uses as coo attributes without change.

Apparently making csr as you do with the indptr etc also just assigns the attributes. A coo to csr conversion might not work so well, since that involves summation of duplicates.

In any case csr math code uses a mix of python and c (cython), and the compiled part works with a limited number of numeric types - long and double integers and floats. I don't think it even works for short ints (int8, int16). It does not implement any of the object dtype delegating that ndarrays do.

With your S:

In [187]: S.A                                                                                                
...
ValueError: unsupported data types in input

In [188]: S.tocoo()                                                                                          
Out[188]: 
<3x3 sparse matrix of type '<class 'numpy.object_'>'
    with 6 stored elements in COOrdinate format>

no value changes are required for tocoo. But back to csr requires summing duplicates:

In [189]: S.tocoo().tocsr()                                                                                  
 ...
TypeError: no supported conversion for types: (dtype('O'),)

In [190]: S.tolil()                                                                                          
/usr/local/lib/python3.6/dist-packages/scipy/sparse/sputils.py:115: UserWarning: object dtype is not supported by sparse matrices
  warnings.warn("object dtype is not supported by sparse matrices")
Out[190]: 
<3x3 sparse matrix of type '<class 'numpy.object_'>'
    with 6 stored elements in LInked List format>

There's no problem in storing this object data

Math with a list of your objects versus an array - similar times:

In [192]: alist = [a]*100                                                                                    
In [193]: arr = np.array(alist)                                                                              
In [194]: timeit [i*j for i,j in zip(alist,alist)]                                                           
77.9 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [195]: timeit arr*arr                                                                                     
75.1 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

An earlier question, which you may have already seen (I just got an upvote), about using int16 in sparse matrices. Same basic issue:

Why can't I assign data to part of sparse matrix in the first "try:"?

The symbolics library has a sparse matrix module: https://docs.sympy.org/latest/modules/matrices/sparse.html

Pandas has its own sparse Series/Dataframe implementations

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html#scipy.sparse.coo_matrix

By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)

Upvotes: 1

Related Questions