Reputation: 228
A recurring error I run into when using NumPy is that an attempt to index an array fails because one of the dimensions of the array was a singleton, and thus that dimension got wiped out and can't be indexed. This is especially problematic in functions designed to operate on arrays of arbitrary size. I'm looking for the cheapest, most universal way to avoid this error.
Here's an example:
import numpy as np
f = (lambda t, u, i=0: t[:,i]*u[::-1])
a = np.eye(3)
b = np.array([1,2,3])
f(a,b)
f(a[:,0],b[1])
The first call works as expected. The second call fails in two ways: 1) t
can't be indexed by [:,0]
because is has shape (3,)
, and 2) u
can't be indexed at all because it's a scalar.
Here are the fixes that occur to me:
1) Use np.atleast_1d
and np.atleast_2d
etc. (possibly with conditionals to make sure that the dimensions are in the right order) inside f
to make sure that all parameters have the dimensions they need. This precludes use of lambdas, and can take a few lines that I would rather not need.
2) Instead of writing f(a[:,0],b[1])
above, use f(a[:,[0]],b[[1]])
. This is fine, but I always have to remember to put in the extra brackets, and if the index is stored in a variable you might not know if you should put the extra brackets in or not. E.g.:
idx = 1
f(a[:,[0]],b[[idx]])
idx = [2,0,1]
f(a[:,[0]],b[idx])
In this case, you would seem to have to call np.atleast_1d
on idx
first, which may be even more cumbersome than putting np.atleast_1d
in the function.
3) In some cases I can get away with just not putting in an index. E.g.:
f = lambda t, u: t[0]*u
f(a,b)
f(a[:,0],b[0])
This works, and is apparently the slickest solution when it applies. But it doesn't help in every case (in particular, your dimensions have to be in the right order to begin with).
So, are there better approaches than the above?
Upvotes: 2
Views: 2276
Reputation: 74232
There are lots of ways to avoid this behaviour.
First, whenever you index into a dimension of an np.ndarray
with a slice
rather than an integer, the number of dimensions of the output will be the same as that of the input:
import numpy as np
x = np.arange(12).reshape(3, 4)
print x[:, 0].shape # integer indexing
# (3,)
print x[:, 0:1].shape # slice
# (3, 1)
This is my preferred way of avoiding the problem, since it generalizes very easily from single-element to multi-element selections (e.g. x[:, i:i+1]
vs x[:, i:i+n]
).
As you've already touched on, you can also avoid dimension loss by using any sequence of integers to index into a dimension:
print x[:, [0]].shape # list
# (3, 1)
print x[:, (0,)].shape # tuple
# (3, 1)
print x[:, np.array((0,))].shape # array
# (3, 1)
If you choose to stick with integer indices, you can always insert a new singleton dimension using np.newaxis
(or equivalently, None
):
print x[:, 0][:, np.newaxis]
# (3, 1)
print x[:, 0][:, None]
# (3, 1)
Or else you could manually reshape it to the correct size (here using -1
to infer the size of the first dimension automatically):
print x[:, 0].reshape(-1, 1).shape
# (3, 1)
Finally, you can use an np.matrix
rather than an np.ndarray
. np.matrix
behaves more like a MATLAB matrix, where singleton dimensions are left in whenever you index with an integer:
y = np.matrix(x)
print y[:, 0].shape
# (3, 1)
However, you should be aware that there are a number of other important differences between np.matrix
and np.ndarray
, for example the *
operator performs elementwise multiplication on arrays, but matrix multiplication on matrices. In most circumstances it's best to stick to np.ndarrays
.
Upvotes: 2