Sesquipedal
Sesquipedal

Reputation: 228

Dealing with dimension collapse in python arrays

A recurring error I run into when using NumPy is that an attempt to index an array fails because one of the dimensions of the array was a singleton, and thus that dimension got wiped out and can't be indexed. This is especially problematic in functions designed to operate on arrays of arbitrary size. I'm looking for the cheapest, most universal way to avoid this error.

Here's an example:

import numpy as np
f = (lambda t, u, i=0: t[:,i]*u[::-1])
a = np.eye(3)
b = np.array([1,2,3])
f(a,b)
f(a[:,0],b[1])

The first call works as expected. The second call fails in two ways: 1) t can't be indexed by [:,0] because is has shape (3,), and 2) u can't be indexed at all because it's a scalar.

Here are the fixes that occur to me:

1) Use np.atleast_1d and np.atleast_2d etc. (possibly with conditionals to make sure that the dimensions are in the right order) inside f to make sure that all parameters have the dimensions they need. This precludes use of lambdas, and can take a few lines that I would rather not need.

2) Instead of writing f(a[:,0],b[1]) above, use f(a[:,[0]],b[[1]]). This is fine, but I always have to remember to put in the extra brackets, and if the index is stored in a variable you might not know if you should put the extra brackets in or not. E.g.:

idx = 1
f(a[:,[0]],b[[idx]])
idx = [2,0,1]
f(a[:,[0]],b[idx])

In this case, you would seem to have to call np.atleast_1d on idx first, which may be even more cumbersome than putting np.atleast_1d in the function.

3) In some cases I can get away with just not putting in an index. E.g.:

f = lambda t, u: t[0]*u
f(a,b)
f(a[:,0],b[0])

This works, and is apparently the slickest solution when it applies. But it doesn't help in every case (in particular, your dimensions have to be in the right order to begin with).

So, are there better approaches than the above?

Upvotes: 2

Views: 2276

Answers (1)

ali_m
ali_m

Reputation: 74232

There are lots of ways to avoid this behaviour.

First, whenever you index into a dimension of an np.ndarray with a slice rather than an integer, the number of dimensions of the output will be the same as that of the input:

import numpy as np

x = np.arange(12).reshape(3, 4)
print x[:, 0].shape               # integer indexing
# (3,)

print x[:, 0:1].shape             # slice
# (3, 1)

This is my preferred way of avoiding the problem, since it generalizes very easily from single-element to multi-element selections (e.g. x[:, i:i+1] vs x[:, i:i+n]).

As you've already touched on, you can also avoid dimension loss by using any sequence of integers to index into a dimension:

print x[:, [0]].shape             # list
# (3, 1)

print x[:, (0,)].shape            # tuple
# (3, 1)

print x[:, np.array((0,))].shape  # array
# (3, 1)

If you choose to stick with integer indices, you can always insert a new singleton dimension using np.newaxis (or equivalently, None):

print x[:, 0][:, np.newaxis]
# (3, 1)

print x[:, 0][:, None]
# (3, 1)

Or else you could manually reshape it to the correct size (here using -1 to infer the size of the first dimension automatically):

print x[:, 0].reshape(-1, 1).shape
# (3, 1)

Finally, you can use an np.matrix rather than an np.ndarray. np.matrix behaves more like a MATLAB matrix, where singleton dimensions are left in whenever you index with an integer:

y = np.matrix(x)
print y[:, 0].shape
# (3, 1)

However, you should be aware that there are a number of other important differences between np.matrix and np.ndarray, for example the * operator performs elementwise multiplication on arrays, but matrix multiplication on matrices. In most circumstances it's best to stick to np.ndarrays.

Upvotes: 2

Related Questions