Matthew Daws
Matthew Daws

Reputation: 1947

Vectorise numpy code on demand

Suppose I have a very basic function in Python:

def f(x, y):
   return x + y

Then I can call this with scalars, f(1, 5.4) == 6.4 or with numpy vectors of arbitrary (but the same) shape. E.g. this works:

x = np.arange(3)
y = np.array([1,4,2.3])
f(x, y)

which gives an array with entries 1, 5, 4.3.

But what if f is more complicated? For example, xx and yy are 1D numpy arrays here.

def g(x, y):
   return np.sum((xx - x)**2 + (yy - y)**2)

(I hasten to add that I'm not interested in this specific g, but in general strategies...) Then g(5, 6) works fine, but if I want to pass numpy arrays, I seem to have to write a very different function with explict broadcasting etc. For example:

def gg(x, y):
   xfull = np.stack([x]*len(xx),axis=-1)
   yfull = np.stack([y]*len(xx),axis=-1)
   return np.sum((xfull - xx)**2 + (yfull - yy)**2, axis=-1)

This does now work with scalars and arrays. But it seems like a mess, and is hard to read.

Is there a better way?

Upvotes: 0

Views: 50

Answers (1)

hpaulj
hpaulj

Reputation: 231615

Given:

def g(x, y):
    return np.sum((xx - x)**2 + (yy - y)**2)

my first questions are:

  • this is written with scalar x and y in mind?
  • what are xx and yy? You say 1d arrays. Same length?
  • why aren't they parameters? Because in this context they are fixed?
  • in words, this offsets xx and yy by constant amounts and takes the sum of their squares, returning a single value?

My next step is to explore the 'broadcasting' limits of this expression. For example it runs for any x that can be used in xx-x. That could be a 0d array, a one element 1d array, an array with the same shape as xx, or anything else that can 'broadcast' with `xx. That's where a thorough understanding of 'broadcasting' is essential.

g(1,2)
g(xx,xx)
g(xx[:,None],yy[None,:])

xx-xx[:,None] though produces a 2d array. np.sum as written takes the sum over all values, i.e. a flattened. Your gg suggests you want to sum on the last axis. If so go ahead and put that in g

def g(x, y):
    return np.sum((xx - x)**2 + (yy - y)**2, axis=-1)

Your use of stack in gg produces:

In [101]: xx
Out[101]: array([0, 1, 2, 3, 4])
In [103]: np.stack([np.arange(3)]*len(xx), axis=-1)
Out[103]: 
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2]])

I would have written that as x[:,None]

In [104]: xx-_
Out[104]: 
array([[ 0,  1,  2,  3,  4],
       [-1,  0,  1,  2,  3],
       [-2, -1,  0,  1,  2]])
In [105]: xx-np.arange(3)[:,None]
Out[105]: 
array([[ 0,  1,  2,  3,  4],
       [-1,  0,  1,  2,  3],
       [-2, -1,  0,  1,  2]])

That does not work with scalar x; but this does

xx-np.asarray(x)[...,None]

np.array or np.asarray is commonly used as the start of numpy functions to accommodate scalar or list inputs. ... is handy when dealing with a variable number of dimensions. reshape(...,-1) and [...,None] are widely used to expand or generalize dimensions.

I've learned a lot by looking the Python code of numpy functions. I've also learned from years of work with MATLAB to be pedantic about dimensions. Keep track of intended and actual array shapes. It helps to use test shapes that will highlight errors. Test with a (2,3) array instead of an ambiguous (3,3) one.

Upvotes: 1

Related Questions