Error converting large sparse matrix to COO

Question

I ran into the following issue trying to vstack two large CSR matrices:

    /usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self)
    229                 raise ValueError('negative row index found')
    230             if self.col.min() < 0:
--> 231                 raise ValueError('negative column index found')
    232
    233     def transpose(self, copy=False):

ValueError: negative column index found

I can reproduce this error very simply by trying to convert a large lil matrix to a coo matrix. The following code works for N=10**9 but fails for N=10**10.

from scipy import sparse
from numpy import random
N=10**10
x = sparse.lil_matrix( (1,N) )
for _ in xrange(1000):
    x[0,random.randint(0,N-1)]=random.randint(1,100)

y = sparse.coo_matrix(x)

Is there a size limit I am hitting for coo matrices? Is there a way around this?

DrV · Accepted Answer

Interestingly, your second example runs well with my installation.

The error message `negative column index found´ sounds like an overflow somewhere. I checked the newest source with the following results:

The actual indexing datatype is calculated in scipy.sparse.sputils.get_index_dtype
The error message comes form the module scipy.sparse.coo

The exception comes from this kind of code:

    idx_dtype = get_index_dtype(maxval=max(self.shape))
    self.row = np.asarray(self.row, dtype=idx_dtype)
    self.col = np.asarray(self.col, dtype=idx_dtype)
    self.data = to_native(self.data)

    if nnz > 0:
        if self.row.max() >= self.shape[0]:
            raise ValueError('row index exceeds matrix dimensions')
        if self.col.max() >= self.shape[1]:
            raise ValueError('column index exceeds matrix dimensions')
        if self.row.min() < 0:
            raise ValueError('negative row index found')
        if self.col.min() < 0:
            raise ValueError('negative column index found')

It is a clear overflow error at - probably - 2**31.

If you want to debug it, try:

import scipy.sparse.sputils
import numpy as np

scipy.sparse.sputils.get_index_dtype((np.array(10**10),))

It should return int64. IF it doesn't the problem is there.

Which version of SciPy?

Error converting large sparse matrix to COO

Answers (2)

Related Questions