Reputation: 1964

Avoid implicit conversion to matrix in numpy operations

Is there a way to globally avoid the matrix from appearing in any of the results of the numpy computations? For example currently if you have x as a numpy.ndarray and y as a scipy.sparse.csc_matrix, and you say x += y, x will become a matrix afterwards. Is there a way to prevent that from happening, i.e., keep x an ndarray, and more generally, keep using ndarray in all places where a matrix is produced?

Upvotes: 4

Answers (2)

denis

Reputation: 21947

Yes, it's a bug; but https://github.com/scipy/scipy/issues/7826 says

I do not really see a way to change this.

An X += c * Y without todense follows.
Some inc( various array / matrix, various sparse ) have been tested, but for sure not all.

def inc( X, Y, c=1. ):
    """ X += c * Y, X Y sparse or dense """
    if (not hasattr( X, "indices" )  # dense += sparse
    and hasattr( Y, "indices" )):
        # inc an ndarray view, because ndarry += sparse -> matrix --
        X = getattr( X, "A", X ).squeeze()
        X[Y.indices] += c * Y.data
    else:
        X += c * Y  # sparse + different sparse: SparseEfficiencyWarning
    return X

Upvotes: 0

hpaulj

Reputation: 231540

I added the scipy tag, This is a scipy.sparse problem, not a np.matrix one.

In [250]: y=sparse.csr_matrix([[0,1],[1,0]])
In [251]: x=np.arange(2)
In [252]: y+x
Out[252]: 
matrix([[0, 2],
        [1, 1]])

the sparse + array => matrix

(as a side note, np.matrix is a subclass of np.ndarray. sparse.csr_matrix is not a subclass. It has many numpy like operations, but it implements them in its own code).

In [255]: x += y
In [256]: x
Out[256]: 
matrix([[0, 2],
        [1, 1]])

technically this shouldn't happen; in effect it is doing x = x+y assigning a new value to x, not just modifying x.

If I first turn y into a regular dense matrix, I get an error. Allowing the action would change a 1d array into a 2d one.

In [258]: x += y.todense()
...
ValueError: non-broadcastable output operand with shape (2,) doesn't match the broadcast shape (2,2)

Changing x to 2d allows the addition to proceed - without changing array to matrix:

In [259]: x=np.eye(2)
In [260]: x
Out[260]: 
array([[ 1.,  0.],
       [ 0.,  1.]])
In [261]: x += y.todense()
In [262]: x
Out[262]: 
array([[ 1.,  1.],
       [ 1.,  1.]])

In general, performing addition/subtraction with sparse matrices is tricky. They were designed for matrix multiplication. Multiplication doesn't change sparsity as much as addition. y+1 for example makes it dense.

Without digging into the details of how sparse addition is coded, I'd say - don't try this x+=... operation without first turning y into a dense version.

In [265]: x += y.A
In [266]: x
Out[266]: 
array([[ 1.,  2.],
       [ 2.,  1.]])

I can't think of a good reason not to do this.

(I should check the scipy github for a bug issue on this).

scipy/sparse/compressed.py has the csr addition code. x+y uses x.__add__(y) but sometimes that is flipped to y.__add__(x). x+=y uses x.__iadd__(y). So I may need to examine __iadd__ for ndarray as well.

But the basic addition for a sparse matrix is:

def __add__(self,other):
    # First check if argument is a scalar
    if isscalarlike(other):
        if other == 0:
            return self.copy()
        else:  # Now we would add this scalar to every element.
            raise NotImplementedError('adding a nonzero scalar to a '
                                      'sparse matrix is not supported')
    elif isspmatrix(other):
        if (other.shape != self.shape):
            raise ValueError("inconsistent shapes")

        return self._binopt(other,'_plus_')
    elif isdense(other):
        # Convert this matrix to a dense matrix and add them
        return self.todense() + other
    else:
        return NotImplemented

So the y+x becomes y.todense() + x. And x+y uses the same thing.

Regardless of the += details, it is clear that adding a sparse to a dense (array or np.matrix) involves converting the sparse to dense. There's no code that iterates through the sparse values and adds those selectively to the dense array.

It's only if the arrays are both sparse that it performs a special sparse addition. y+y works, returning a sparse. y+=y fails with a NotImplmenentedError from sparse.base.__iadd__.

This is the best diagnostic sequence that I've come up, trying various ways of adding y to a (2,2) array.

In [348]: x=np.eye(2)
In [349]: x+y
Out[349]: 
matrix([[ 1.,  1.],
        [ 1.,  1.]])
In [350]: x+y.todense()
Out[350]: 
matrix([[ 1.,  1.],
        [ 1.,  1.]])

Addition produces a matrix, but values can be written to x without changing x class (or shape)

In [351]: x[:] = x+y
In [352]: x
Out[352]: 
array([[ 1.,  1.],
       [ 1.,  1.]])

+= with a dense matrix does the same:

In [353]: x += y.todense()
In [354]: x
Out[354]: 
array([[ 1.,  2.],
       [ 2.,  1.]])

but something in the +=sparse changes the class of x

In [355]: x += y
In [356]: x
Out[356]: 
matrix([[ 1.,  3.],
        [ 3.,  1.]])

Further testing and looking at id(x) and x.__array_interface__ it is clear that x += y replaces x. This is true even if x starts as np.matrix. So the sparse += is not an inplace operation. x += y.todense() is an inplace operation.

Upvotes: 2

Avoid implicit conversion to matrix in numpy operations

Answers (2)

Related Questions