Eypros
Eypros

Reputation: 5723

Why numpy can save and load objects different than numpy arrays

I was trying to make another point and accidentally saved a dict using numpy np.save(). To my surprise there seem to be no problem at all with that approach. I tried the above with another object that it's not np.array like a list and it seem to work fine.

For example the following code, saves and loads an object using np.save() and np.load():

list_file = 'random_list.npy'
random_list = [x*2 for x in range(20)]
np.save(list_file, random_list)

# load numpy array
random_list2 = np.load(list_file)
set(random_list) == set(random_list2)

True

So, my question is:

  1. Why is this succeeding anyway since in the documentation only arrays are mentioned?
  2. And similarly if it was meant to deal with other objects, which objects can be handled?

I know there are some limitation regarding pickle which could affect the nature of object that could be handled but a lot of unclear points still exist.

Edit:
I thought that np.save() was just trying to convert the object passed as parameter to numpy array but that does not make any sense in some cases like dict.

For example a dict passed to a np.array does not seem to be functional at all:

a = {1: 0, 2: 1, 3: 2}
b = np.array(a)
type(b)

numpy.ndarray

b.shape

()

Upvotes: 4

Views: 1252

Answers (2)

ivan_pozdeev
ivan_pozdeev

Reputation: 36096

numpy.save() documents its argument as "array-like".

As per numpy: formal definition of "array_like" objects?, the underlying numpy/core/src/multiarray/ctors.c:PyArray_FromAny() accepts:

/* op is an array */

/* op is a NumPy scalar */

/* op is a Python scalar */

/* op supports the PEP 3118 buffer interface */

/* op supports the __array_struct__ or __array_interface__ interface */

/* op supplies the __array__ function. */

/* Try to treat op as a list of lists */

Specifically for dict, the execution path goes like this:

numpy/npyio.py -> numpy/core/numeric.py:asanyarray() -> numpy/core/src/multiarray/multiarraymodule.c:_array_fromobject() -> numpy/core/src/multiarray/ctors.c:PyArray_CheckFromAny() -> the aforementioned PyArray_FromAny. There:

<...>
PyArray_GetArrayParamsFromObject(op, newtype,
                        0, &dtype,
                        &ndim, dims, &arr, context)
<...>
        else {
            if (newtype == NULL) {
                newtype = dtype;    #object dtype
<...>
            ret = (PyArrayObject *)PyArray_NewFromDescr(&PyArray_Type, newtype,
                                         ndim, dims,
                                         NULL, NULL,
                                         flags&NPY_ARRAY_F_CONTIGUOUS, NULL);
    return (PyObject *)ret;

Upvotes: 4

hpaulj
hpaulj

Reputation: 231625

A demo:

In [507]: np.savez('test', a = [x*2 for x in range(3)], b=dict(a=1,b=np.arange(3)))

In [510]: d = np.load('test.npz')
In [511]: d['a']
Out[511]: array([0, 2, 4])

This list was converted to an array and saved.

In [512]: d['b']
Out[512]: array({'a': 1, 'b': array([0, 1, 2])}, dtype=object)
In [513]: d['b'].shape
Out[513]: ()
In [514]: d['b'].item()   # or d['b'][()]
Out[514]: {'a': 1, 'b': array([0, 1, 2])}

The dictionary was wrapped in a 0d object dtype array, and saved with pickle. The array within the dictionary was pickled with save.

np.save uses pickle where needed to handle non-array objects, and pickle uses save to handle array objects.

Upvotes: 3

Related Questions