Why is there a large overhead in pickling numpy arrays?

Question

Suppose I have a simple array in Python:

>>> x = [1.0, 2.0, 3.0, 4.0]

When pickled, it is a reasonably small size:

>>> pickle.dumps(x).__len__()
44

How come if I use a numpy array, the size is so much larger?

>>> xn = np.array(x)
>>> pickle.dumps(xn).__len__()
187

Converting it to a less precise data type only helps a little bit...

>>> x16 = xn.astype('float16')
>>> pickle.dumps(x16).__len__()
163

Other numpy/scipy data structures like sparse matrices also don't pickle well. Why?

Akshat Harit · Accepted Answer

Checking it in a debugger, a numpy array has the fields like max, min, type etc apart from the data, which I am not sure a python list has.

As pickling is just a binary copying, these other fields are also being copied, resulting in a larger size.

Answers (1)