dimensions of array of arrays in numpy

Question

I'd like to operate on "jagged arrays", and I prefer write "A + A" instead of "[x + y for x,y in zipped(A,A)]"

For that I'd like to convert list of arrays of different sizes into an overall numpy array, but ran into an error due to seemingly over-zealous broadcasting (notice the first three succeeded, but the last one failed):

In[209]: A = array([ones([3,3]), array([1, 2])])
In[210]: A = array([ones([3,3]), array([1, 2])], dtype=object)
In[211]: A = array([ones([3,2]), array([1, 2])], dtype=object)
In[212]: A = array([ones([2,2]), array([1, 2])], dtype=object)
Traceback (most recent call last):
  File "/home/hzhang/.conda/envs/myenv/lib/python3.4/site-
packages/IPython/core/interactiveshell.py", line 2881, in run_code
  exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in 
  A = array([ones([2,2]), array([1, 2])], dtype=object)
ValueError: could not broadcast input array from shape (2,2) into shape (2)

Help?

hpaulj · Accepted Answer

Your case is a variant on the 3rd case in my answer to

How to keep numpy from broadcasting when creating an object array of different shaped arrays

np.array tries to create a multidimensional array of numbers from the input list. If the component dimensions are sufficiently different it resorts to keeping the arrays separate, making an object array instead. I think of this kind of array as a glorified/debased list.

How to store multiple numpy 1d arrays with different lengths and print it

In your problem case, the dimensions are close enough that it 'thinks' it can make a 2d array, but when it starts to fill in those values it finds that it can't broadcast values to do so, and so throws the error. One could argue that it should have backtracked and taken the 'object' array route. But that decision tree is buried deep in compiled code.

The problem case in that earlier SO question was

np.array([np.zeros((2, 2)), np.zeros((2,3))])

The 1st dimensions match, but the 2nd don't. I'm not entirely sure why your IN[211] works but In[212] does not. But the error message is the same, right down to the (2,2) => (2) attempt.

edit

oops - I first read your problem example as:

np.array([np.ones([2,2]), np.ones([1, 2])], dtype=object)

That is, combining a (2,2) with (1,2), which does produce a (2,) object. What you are actually combine is a

 (2,2) with a (2,)

So it looks like the target is np.empty((2,2),float) (or object), because out[...]=[ones([2,2]), array([1,2])] produces this error.

In any case the most reliable way of creating an object array is to initialize it, and copy the arrays.

Out[90]: array([None, None], dtype=object)
In [91]: arr[:]=[ones([2,2]), array([1, 2])]
In [92]: arr
Out[92]: 
array([array([[ 1.,  1.],
       [ 1.,  1.]]), array([1, 2])], dtype=object)

Be cautious about doing math on object arrays like this. What works is hit-or-miss:

In [93]: A+A
Out[93]: 
array([array([[ 2.,  2.],
       [ 2.,  2.],
       [ 2.,  2.]]),
       array([2, 4])], dtype=object)

In [96]: np.min(A[1])
Out[96]: 1
In [97]: np.min(A)
....
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [98]: A.sum()
Out[98]: 
array([[ 2.,  3.],
       [ 2.,  3.],
       [ 2.,  3.]])

this works because A[0]+A[1] works. A[1] is (2,) which broadcasts to (3,2).

With object arrays numpy resorts to some sort of list comprehension, iterating over the object elements. So may get the convenience of array notation, but not the same speed as you would with a true 2d array.

dimensions of array of arrays in numpy

Answers (1)

edit

Related Questions