zegkljan
zegkljan

Reputation: 8401

Why does numpy.broadcast "transpose" results of vstack and similar functions?

Observe:

In [1]: import numpy as np
In [2]: x = np.array([1, 2, 3])
In [3]: np.vstack([x, x])
Out[3]: 
array([[1, 2, 3],
       [1, 2, 3]])

In [4]: np.vstack(np.broadcast(x, x))
Out[4]: 
array([[1, 1],
       [2, 2],
       [3, 3]])

Similarly for column_stack and row_stack (hstack behaves differently in this case but it also differs when used with broadcast). Why?

I'm after the logic behind that rather than finding a way of "repairing" this behavior (I'm just fine with it, it's just unintuitive).

Upvotes: 5

Views: 619

Answers (1)

Alex Riley
Alex Riley

Reputation: 176790

np.broadcast returns an instance of an iterator object that describes how the arrays should be broadcast together.1 Among other things, it describes the shape and the number of dimensions that the resulting array will have.

Crucially, when you actually iterate over this object in Python you get back tuples of elements from each input array:

>>> b = np.broadcast(x, x)
>>> b.shape
(3,)
>>> b.ndim
1
>>> list(b)
[(1, 1), (2, 2), (3, 3)]

This tells us that if we were performing an actual operation on the arrays (say, x+x) NumPy would return an array of shape (3,), one dimension and combine the elements in the tuple to produce the values in the final array (e.g. it would perform 1+1, 2+2, 3+3 for the addition).

If you dig in to the source of vstack you find that all it does is make sure the elements of the iterable that it has been given are at least two-dimensional, and then stack them along axis 0.

In the case of b = np.broadcast(x, x) this means that we get the following arrays to stack:

>>> [np.atleast_2d(_m) for _m in b]
[array([[1, 1]]), array([[2, 2]]), array([[3, 3]])]

These three small arrays are then stacked vertically producing the output you note.


1 Exactly how arrays of varying dimensions are iterated over in parallel is at the very heart of how NumPy's broadcasting works. The code can be found mostly in iterators.c. An interesting overview of NumPy's multidimensional iterator, written by Travis Oliphant himself, can be found in the Beautiful Code book.

Upvotes: 5

Related Questions