sffc
sffc

Reputation: 6424

Why is there a large overhead in pickling numpy arrays?

Suppose I have a simple array in Python:

>>> x = [1.0, 2.0, 3.0, 4.0]

When pickled, it is a reasonably small size:

>>> pickle.dumps(x).__len__()
44

How come if I use a numpy array, the size is so much larger?

>>> xn = np.array(x)
>>> pickle.dumps(xn).__len__()
187

Converting it to a less precise data type only helps a little bit...

>>> x16 = xn.astype('float16')
>>> pickle.dumps(x16).__len__()
163

Other numpy/scipy data structures like sparse matrices also don't pickle well. Why?

Upvotes: 2

Views: 902

Answers (1)

Akshat Harit
Akshat Harit

Reputation: 824

Checking it in a debugger, a numpy array has the fields like max, min, type etc apart from the data, which I am not sure a python list has.

A complete list can be found on http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

As pickling is just a binary copying, these other fields are also being copied, resulting in a larger size.

Upvotes: 1

Related Questions