Numpy array vs. C++ vector in memory efficiency

Question

Which object is generally smaller in memory given the exact same data: a numpy array with dtype int64 or a C++ vector of type int? For example:

v = np.array([34, 23])
std::vector v { 34,23 };

hpaulj · Accepted Answer

There effectively 2 parts to an np.array - the object overhead plus attributes like shape and strides, and a data buffer. The first has roughly the same size for all arrays, the second scales with the number of elements (and the size of each element). In numpy the data buffer is 1d, regardless of the array shape.

With only 2 elements the overhead part of your example array is probably larger than the databuffer. But with 1000s of elements the size proportion goes the other way.

Saving the array with np.save will give a rough idea of the memory use. That file format writes a header buffer (256 bytes?), and the rest is the databuffer.

I'm less familiar with C++ storage, though I think that's more transparent (if you know the language).

But remember efficiency in storing one array is only part of the story. In practice you need to think about the memory use when doing math and indexing. The ndarray distinction between view and copy makes it harder to predict just how much memory is being used.

In [1155]: np.save('test.npy',np.array([1,2]))

In [1156]: ls -l test.npy
-rw-rw-r-- 1 paul paul 88 Jun 30 17:08 test.npy

In [1157]: np.save('test.npy',np.arange(1000))

In [1158]: ls -l test.npy
-rw-rw-r-- 1 paul paul 4080 Jun 30 17:08 test.npy

This looks like 80 bytes of header, and 4*len bytes for the data.

Numpy array vs. C++ vector in memory efficiency

Answers (1)

Related Questions