Reputation: 4234
I use numpy to load a large matrix using 64bit Python.
It works fine on Macbook Pro with 8GB memory.
>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mb = MultiLabelBinarizer()
>>> matrix = mb.fit_transform(questions_topics)
>>> sys.getsizeof(matrix)
47975472376
>>> matrix.shape
(2999967, 1999)
But it raises MemoryError
on Ubuntu Google VM instance with 16GB memory and 10GB swap.
>>> y = mb.fit_transform(questions_topics)
/home/Liwink/anaconda3/lib/python3.5/site-packages/scipy/sparse/base.py in _process_toarray_args(self, order, out)
1037 return out
1038 else:
-> 1039 return np.zeros(self.shape, dtype=self.dtype, order=order)
1040
1041 def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
MemoryError:
When the matrix is loaded on Mac OS, it takes 50G VIRT.
I have two questions:
Upvotes: 4
Views: 1001
Reputation: 4234
Thanks @juanpa.arrivillaga, I solved the problem by increasing the swap.
But it is still not perfect, since on Ubuntu it will use up the memory first but on Mac OS it "saves" memory a lot.
On Ubuntu, it uses much more RES than it on Mac OS.
How can I save the memory on Ubuntu?
Upvotes: 0