Reputation: 5539
I am using scipy.sparse
in my application and want to do some performance tests. In order to do that, I need to create a large sparse matrix (which I will then use in my application). As long as the matrix is small, I can create it using the command
import scipy.sparse as sp
a = sp.rand(1000,1000,0.01)
Which results in a 1000 by 1000 matrix with 10.000 nonzero entries (a reasonable density meaning approximately 10 nonzero entries per row)
The problem is when I try to create a larger matrix, for example, a 100.000 by 100.000 matrix (I have dealt with way larger matrices before), I run
import scipy.sparse as sp
N = 100000
d = 0.0001
a = sp.rand(N, N, d)
which should result in a 100.000 by 100.000 matrix with one million nonzero entries (way in the realm of possible), I get an error message:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
sp.rand(100000,100000,0.0000001)
File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long
Which is some annoying internal scipy
error I cannot remove.
I understand that I can create a 10*n by 10*n matrix by creating one hundred n by n matrices, then stacking them together, however, I think that scipy.sparse
should be able to handle the creation of large sparse matrices (I say again, 100k by 100k is by no means large, and scipy
is more than comfortable handling matrices with several million rows). Am I missing something?
Upvotes: 3
Views: 1961
Reputation: 35761
Without getting to the bottom of the issue, you should make sure that you are using a 64 bit build on a 64 bit architecture, on a Linux platform. There, the native "long" data type is of 64 bit size (as opposed to Windows, I believe).
For reference, see these tables:
Edit: Maybe I was not explicit enough before -- on a 64 bit Windows, the classical native "long" data type is of 32 bit size (also see this question). This might be a problem in your case. That is, your code might just work when you change platform to Linux. I cannot say this with absolute certainty, because it really depends on which native data types are used in the numpy/scipy C source (of course there are 64 bit data types available on Windows, and usually a platform case analysis is performed with compiler directives, and proper types are chosen via macros -- I cannot really imagine that they've used 32 bit data types by accident).
Edit 2:
I can provide three data samples supporting my hypothesis.
Debian 64 bit, Python 2.7.3 and SciPy 0.10.1 binaries from Debian repos:
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; print scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
0.10.1
(100000, 100000)
Windows 7 64 bit, 32 bit Python build, 32 bit SciPy 0.10.1 build, both from ActivePython:
ActivePython 2.7.5.6 (ActiveState Software Inc.) based on
Python 2.7.5 (default, Sep 16 2013, 23:16:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; print scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
0.10.1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\AppData\Roaming\Python\Python27\site-packages\scipy\sparse\construct.py", line 426, in rand
raise ValueError(msg % np.iinfo(tp).max)
ValueError: Trying to generate a random sparse matrix such as the product of dimensions is
greater than 2147483647 - this is not supported on this machine
Windows 7 64 bit, 64 bit ActivePython build, 64 bit SciPy 0.15.1 build (from Gohlke, build against MKL):
ActivePython 3.4.1.0 (ActiveState Software Inc.) based on
Python 3.4.1 (default, Aug 7 2014, 13:09:27) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
'0.15.1'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long
Upvotes: 3