Reputation: 340
I ran into the following issue trying to vstack two large CSR matrices:
/usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self)
229 raise ValueError('negative row index found')
230 if self.col.min() < 0:
--> 231 raise ValueError('negative column index found')
232
233 def transpose(self, copy=False):
ValueError: negative column index found
I can reproduce this error very simply by trying to convert a large lil matrix to a coo matrix. The following code works for N=10**9 but fails for N=10**10.
from scipy import sparse
from numpy import random
N=10**10
x = sparse.lil_matrix( (1,N) )
for _ in xrange(1000):
x[0,random.randint(0,N-1)]=random.randint(1,100)
y = sparse.coo_matrix(x)
Is there a size limit I am hitting for coo matrices? Is there a way around this?
Upvotes: 11
Views: 6260
Reputation: 23480
Interestingly, your second example runs well with my installation.
The error message `negative column index found´ sounds like an overflow somewhere. I checked the newest source with the following results:
scipy.sparse.sputils.get_index_dtype
scipy.sparse.coo
The exception comes from this kind of code:
idx_dtype = get_index_dtype(maxval=max(self.shape))
self.row = np.asarray(self.row, dtype=idx_dtype)
self.col = np.asarray(self.col, dtype=idx_dtype)
self.data = to_native(self.data)
if nnz > 0:
if self.row.max() >= self.shape[0]:
raise ValueError('row index exceeds matrix dimensions')
if self.col.max() >= self.shape[1]:
raise ValueError('column index exceeds matrix dimensions')
if self.row.min() < 0:
raise ValueError('negative row index found')
if self.col.min() < 0:
raise ValueError('negative column index found')
It is a clear overflow error at - probably - 2**31.
If you want to debug it, try:
import scipy.sparse.sputils
import numpy as np
scipy.sparse.sputils.get_index_dtype((np.array(10**10),))
It should return int64
. IF it doesn't the problem is there.
Which version of SciPy?
Upvotes: 6
Reputation: 17787
Looks like you're hitting the limits of 32-bit integers. Here's a quick test:
In [14]: np.array([10**9, 10**10], dtype=np.int64)
Out[14]: array([ 1000000000, 10000000000])
In [15]: np.array([10**9, 10**10], dtype=np.int32)
Out[15]: array([1000000000, 1410065408], dtype=int32)
For now, most sparse matrix representations assume 32-bit integer indices, so they simply cannot support matrices that large.
EDIT: As of version 0.14, scipy now supports 64-bit indexing. If you can upgrade, this problem will go away.
Upvotes: 8