How do you create a (sometimes) ragged array of arrays in Numpy?

Question

In Numpy, I want to create an array of integer arrays (or lists). Each individual array is a set of indices. These individual arrays generally have different lengths, but sometimes all have the same length.

When the lengths are different, I can create the array as

test = np.array([[1,2],[1,2,3]],dtype=object)

When I do this, test[0] is a list of integers and I can use other_array[test[0]] without issue.

However, when test happens to have entries all the same size and I do

test = np.array([[1,2],[1,3]], dtype=object)

then test[0] is a Numpy array of dtype object. When I use other_array[test[0]] I get an error that arrays used as indices must be of integer (or boolean) type.

Here is a complete example:

other_array = np.array([0,1,2,3])
test1 = np.array([[1,2],[1,2,3]], dtype=object)
print(other_array[test1[0]]) #this works

test2 = np.array([[1,2],[1,3]], dtype=object)
print(other_array[test2[0]]) #this fails

The only way I have found around this issue is to check if test will be ragged or not before creating it and use dtype=int when it happens to have arrays of all the same size. This seems inefficient. Is there a generic way to create an array of integer arrays that is sometimes ragged and sometimes not without checking for raggedness?

hpaulj · Accepted Answer

To consistently make an object dtype array, you need to initialize one of the right size, and then assign the list to it:

In [86]: res = np.empty(2, object)
In [87]: res
Out[87]: array([None, None], dtype=object)
In [88]: res[:] = [[1,2],[1,2,3]]
In [89]: res
Out[89]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [90]: res[:] = [[1,2],[1,3]]
In [91]: res
Out[91]: array([list([1, 2]), list([1, 3])], dtype=object)

You can't assign a (2,n) array this way:

In [92]: res[:] = np.array([[1,2],[1,3]])
Traceback (most recent call last):
  File "", line 1, in 
    res[:] = np.array([[1,2],[1,3]])
ValueError: could not broadcast input array from shape (2,2) into shape (2,)

but a list of arrays works:

In [93]: res[:] = [np.array([1,2]),np.array([1,3])]
In [94]: res
Out[94]: array([array([1, 2]), array([1, 3])], dtype=object)
In [95]: res[:] = list(np.array([[1,2],[1,3]]))
In [96]: res
Out[96]: array([array([1, 2]), array([1, 3])], dtype=object)

The basic point is that multidimensional numeric dtype arrays are the preferred kind, while object dtype is a fall-back option, especially when using np.array(). And with some combinations of array shapes, np.array will raise an error rather than create the object dtype. So the create-and-fill is the only consistent action.

your test1, test2

Out[97]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [98]: np.array([[1,2],[1,2,3]], dtype=object)[0]
Out[98]: [1, 2]
In [99]: np.array([[1,2],[1,3]], dtype=object)
Out[99]: 
array([[1, 2],
       [1, 3]], dtype=object)
In [100]: np.array([[1,2],[1,3]], dtype=object)[0]
Out[100]: array([1, 2], dtype=object)
In [103]: np.array([[1,2],[1,3]])[0]
Out[103]: array([1, 2])

But I wonder if there's any need to make an array from list of lists. If you are just using them as indices, indexing the list is just as good:

In [105]: [[1,2],[1,3]][0]
Out[105]: [1, 2]
In [106]: [[1,2],[1,2,3]][0]
Out[106]: [1, 2]

Note that np.nonzero (aka np.where) returns a tuple of arrays. This can be used directly as a multidimensional index. np.argwhere applies transpose to that tuple, creating an (n,ndim) array. That looks nice, but can't be used for indexing (directly).

How do you create a (sometimes) ragged array of arrays in Numpy?

Answers (2)

your test1, test2

Related Questions