R zu
R zu

Reputation: 2074

How does numpy allocate memory for nested array?

The following program creates a large array from a nested list of arrays:

import numpy as np
a = np.arange(6).reshape(2, 3)
nested_list = [[a, a + 1], [a + 2, a + 3]]
b = np.array(nested_list)

Does np.array pre-allocate memory for only once for the result before copying data into the memory in this case?

Or, this is similar to:

c = np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])

which would pre-allocate memory for 3 times?

>>> b
array([[[[0, 1, 2],
         [3, 4, 5]],

        [[1, 2, 3],
         [4, 5, 6]]],


       [[[2, 3, 4],
         [5, 6, 7]],

        [[3, 4, 5],
         [6, 7, 8]]]])
>>> c
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])
>>> b.shape
(2, 2, 2, 3)
>>> b.reshape(2*2, 2*3)
array([[0, 1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5, 6],
       [2, 3, 4, 5, 6, 7],
       [3, 4, 5, 6, 7, 8]])

Upvotes: 0

Views: 331

Answers (1)

hpaulj
hpaulj

Reputation: 231385

nested_list = [[a, a + 1], [a + 2, a + 3]] produces 3 new arrays (the sums) plus a list of pointers to those arrays. That's just basic Python interpreter action.

b = np.array(nested_list): np.array is a complex compiled function, so without some serious digging it is hard to tell exactly what it does. My impression from previous use, and especially errors when components don't exactly match in size, is that it scans the input to determine the highest-dimensional array that it can create, and then plugs the pieces in, with type conversions if needed.

It's easy to do time comparisons; harder to track memory use. But assuming that data copying is the biggest time consumer, time tests are probably a good proxy for memory use. And unless we are hitting memory errors, we are usually more concerned with time than memory use.

In [565]: alist = [[a,a+1],[a+2,a+3]]
In [566]: allist = [[a.tolist(), (a+1).tolist()],[(a+2).tolist(), (a+3).tolist()]]

In [567]: timeit np.array(alist)
6.74 µs ± 63.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [568]: timeit np.array(allist)
9.92 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Working from the nested list of arrays is a bit faster than working from the pure list equivalent. It may be copying those arrays to the target as blocks.

Individual stacks is noticeably slower, though it also creates the a+n arrays as well:

In [569]: timeit c = np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])
37.8 µs ± 39 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

np.stack acts the same as np.array (with the default axis). It too uses concatenate:

In [570]: timeit np.stack(alist)
28.7 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Including the a+n calculations into the timing may be fairer:

In [571]: %%timeit
     ...: alist = [[a,a+1],[a+2,a+3]]
     ...: np.stack(alist)
     ...: 
38.6 µs ± 509 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [572]: %%timeit
     ...: alist = [[a,a+1],[a+2,a+3]]
     ...: np.array(alist)
     ...: 
15.7 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The new np.block was mentioned - it produces something different and is quite a bit slower

In [573]: np.block(alist)
Out[573]: 
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])
In [574]: timeit np.block(alist)
126 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

block produces the same 2d array as the nested stacks:

np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])

np.array and np.stack produce a 4d array. It can be reshaped to 2d, but the order of elements is different. To match we'd need to do some transposing before reshaping. e.g.

In [590]: np.array(alist).transpose(0,2,1,3).reshape(4,6)
Out[590]: 
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])

Upvotes: 1

Related Questions