Reputation: 43
In Numpy, I want to create an array of integer arrays (or lists). Each individual array is a set of indices. These individual arrays generally have different lengths, but sometimes all have the same length.
When the lengths are different, I can create the array as
test = np.array([[1,2],[1,2,3]],dtype=object)
When I do this, test[0]
is a list of integers and I can use other_array[test[0]]
without issue.
However, when test
happens to have entries all the same size and I do
test = np.array([[1,2],[1,3]], dtype=object)
then test[0]
is a Numpy array of dtype object
. When I use other_array[test[0]]
I get an error that arrays used as indices must be of integer (or boolean) type
.
Here is a complete example:
other_array = np.array([0,1,2,3])
test1 = np.array([[1,2],[1,2,3]], dtype=object)
print(other_array[test1[0]]) #this works
test2 = np.array([[1,2],[1,3]], dtype=object)
print(other_array[test2[0]]) #this fails
The only way I have found around this issue is to check if test
will be ragged or not before creating it and use dtype=int
when it happens to have arrays of all the same size. This seems inefficient. Is there a generic way to create an array of integer arrays that is sometimes ragged and sometimes not without checking for raggedness?
Upvotes: 4
Views: 6178
Reputation: 231475
To consistently make an object dtype array, you need to initialize one of the right size, and then assign the list to it:
In [86]: res = np.empty(2, object)
In [87]: res
Out[87]: array([None, None], dtype=object)
In [88]: res[:] = [[1,2],[1,2,3]]
In [89]: res
Out[89]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [90]: res[:] = [[1,2],[1,3]]
In [91]: res
Out[91]: array([list([1, 2]), list([1, 3])], dtype=object)
You can't assign a (2,n) array this way:
In [92]: res[:] = np.array([[1,2],[1,3]])
Traceback (most recent call last):
File "<ipython-input-92-f05200126d48>", line 1, in <module>
res[:] = np.array([[1,2],[1,3]])
ValueError: could not broadcast input array from shape (2,2) into shape (2,)
but a list of arrays works:
In [93]: res[:] = [np.array([1,2]),np.array([1,3])]
In [94]: res
Out[94]: array([array([1, 2]), array([1, 3])], dtype=object)
In [95]: res[:] = list(np.array([[1,2],[1,3]]))
In [96]: res
Out[96]: array([array([1, 2]), array([1, 3])], dtype=object)
The basic point is that multidimensional numeric dtype arrays are the preferred kind, while object dtype is a fall-back option, especially when using np.array()
. And with some combinations of array shapes, np.array
will raise an error rather than create the object dtype. So the create-and-fill is the only consistent action.
Out[97]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [98]: np.array([[1,2],[1,2,3]], dtype=object)[0]
Out[98]: [1, 2]
In [99]: np.array([[1,2],[1,3]], dtype=object)
Out[99]:
array([[1, 2],
[1, 3]], dtype=object)
In [100]: np.array([[1,2],[1,3]], dtype=object)[0]
Out[100]: array([1, 2], dtype=object)
In [103]: np.array([[1,2],[1,3]])[0]
Out[103]: array([1, 2])
But I wonder if there's any need to make an array from list of lists. If you are just using them as indices, indexing the list is just as good:
In [105]: [[1,2],[1,3]][0]
Out[105]: [1, 2]
In [106]: [[1,2],[1,2,3]][0]
Out[106]: [1, 2]
Note that np.nonzero
(aka np.where
) returns a tuple of arrays. This can be used directly as a multidimensional index. np.argwhere
applies transpose
to that tuple, creating an (n,ndim) array. That looks nice, but can't be used for indexing (directly).
Upvotes: 4
Reputation: 381
You maybe have a good reason to use numpy on this, idk. To make thing work where it fails you could unpack it first. This works for both ragged and even. You don’t need to use any checkers as well.
test2 = np.array([[1,2],*[1,3]], dtype=object)
print(other_array[test2[0]]) #this works
Upvotes: -1