Combinatorialist
Combinatorialist

Reputation: 43

How do you create a (sometimes) ragged array of arrays in Numpy?

In Numpy, I want to create an array of integer arrays (or lists). Each individual array is a set of indices. These individual arrays generally have different lengths, but sometimes all have the same length.

When the lengths are different, I can create the array as

test = np.array([[1,2],[1,2,3]],dtype=object)

When I do this, test[0] is a list of integers and I can use other_array[test[0]] without issue.

However, when test happens to have entries all the same size and I do

test = np.array([[1,2],[1,3]], dtype=object)

then test[0] is a Numpy array of dtype object. When I use other_array[test[0]] I get an error that arrays used as indices must be of integer (or boolean) type.

Here is a complete example:

other_array = np.array([0,1,2,3])
test1 = np.array([[1,2],[1,2,3]], dtype=object)
print(other_array[test1[0]]) #this works

test2 = np.array([[1,2],[1,3]], dtype=object)
print(other_array[test2[0]]) #this fails

The only way I have found around this issue is to check if test will be ragged or not before creating it and use dtype=int when it happens to have arrays of all the same size. This seems inefficient. Is there a generic way to create an array of integer arrays that is sometimes ragged and sometimes not without checking for raggedness?

Upvotes: 4

Views: 6178

Answers (2)

hpaulj
hpaulj

Reputation: 231475

To consistently make an object dtype array, you need to initialize one of the right size, and then assign the list to it:

In [86]: res = np.empty(2, object)
In [87]: res
Out[87]: array([None, None], dtype=object)
In [88]: res[:] = [[1,2],[1,2,3]]
In [89]: res
Out[89]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [90]: res[:] = [[1,2],[1,3]]
In [91]: res
Out[91]: array([list([1, 2]), list([1, 3])], dtype=object)

You can't assign a (2,n) array this way:

In [92]: res[:] = np.array([[1,2],[1,3]])
Traceback (most recent call last):
  File "<ipython-input-92-f05200126d48>", line 1, in <module>
    res[:] = np.array([[1,2],[1,3]])
ValueError: could not broadcast input array from shape (2,2) into shape (2,)

but a list of arrays works:

In [93]: res[:] = [np.array([1,2]),np.array([1,3])]
In [94]: res
Out[94]: array([array([1, 2]), array([1, 3])], dtype=object)
In [95]: res[:] = list(np.array([[1,2],[1,3]]))
In [96]: res
Out[96]: array([array([1, 2]), array([1, 3])], dtype=object)

The basic point is that multidimensional numeric dtype arrays are the preferred kind, while object dtype is a fall-back option, especially when using np.array(). And with some combinations of array shapes, np.array will raise an error rather than create the object dtype. So the create-and-fill is the only consistent action.

your test1, test2

Out[97]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [98]: np.array([[1,2],[1,2,3]], dtype=object)[0]
Out[98]: [1, 2]
In [99]: np.array([[1,2],[1,3]], dtype=object)
Out[99]: 
array([[1, 2],
       [1, 3]], dtype=object)
In [100]: np.array([[1,2],[1,3]], dtype=object)[0]
Out[100]: array([1, 2], dtype=object)
In [103]: np.array([[1,2],[1,3]])[0]
Out[103]: array([1, 2])

But I wonder if there's any need to make an array from list of lists. If you are just using them as indices, indexing the list is just as good:

In [105]: [[1,2],[1,3]][0]
Out[105]: [1, 2]
In [106]: [[1,2],[1,2,3]][0]
Out[106]: [1, 2]

Note that np.nonzero (aka np.where) returns a tuple of arrays. This can be used directly as a multidimensional index. np.argwhere applies transpose to that tuple, creating an (n,ndim) array. That looks nice, but can't be used for indexing (directly).

Upvotes: 4

The.B
The.B

Reputation: 381

You maybe have a good reason to use numpy on this, idk. To make thing work where it fails you could unpack it first. This works for both ragged and even. You don’t need to use any checkers as well.

test2 = np.array([[1,2],*[1,3]], dtype=object)
print(other_array[test2[0]]) #this works

Upvotes: -1

Related Questions