Yihe
Yihe

Reputation: 4234

Numpy and memory allocation on Mac OS X vs. Linux

I use numpy to load a large matrix using 64bit Python.

It works fine on Macbook Pro with 8GB memory.

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mb = MultiLabelBinarizer()
>>> matrix = mb.fit_transform(questions_topics)
>>> sys.getsizeof(matrix) 
47975472376
>>> matrix.shape
(2999967, 1999)

But it raises MemoryError on Ubuntu Google VM instance with 16GB memory and 10GB swap.

>>> y = mb.fit_transform(questions_topics)
/home/Liwink/anaconda3/lib/python3.5/site-packages/scipy/sparse/base.py in _process_toarray_args(self, order, out)
1037             return out
1038         else:
-> 1039             return np.zeros(self.shape, dtype=self.dtype, order=order)
1040
1041     def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
MemoryError:

When the matrix is loaded on Mac OS, it takes 50G VIRT. enter image description here

I have two questions:

  1. Where is the matrix(about 50GB) kept, in memory or disk?
  2. How can I load this matrix on VM?

Upvotes: 4

Views: 1001

Answers (1)

Yihe
Yihe

Reputation: 4234

Thanks @juanpa.arrivillaga, I solved the problem by increasing the swap.

But it is still not perfect, since on Ubuntu it will use up the memory first but on Mac OS it "saves" memory a lot.

On Ubuntu: enter image description here

On Mac OS: enter image description here

On Ubuntu, it uses much more RES than it on Mac OS.

How can I save the memory on Ubuntu?

Upvotes: 0

Related Questions