How to do numpy apply_along_axis when function returns higher-dimensional array?

Question

I have an array of shape (10, 100000) and a function f that takes an array of shape (100000,) to an array of shape (200,200). What is the simplest way to apply f to each of the 10 rows to get an array of shape (10,200,200)? I was hoping to use apply_along_axis, but it seems it will not work because the dimensionality of the output of f is higher than the dimensionality of the input.

(More generally, given an array a of shape (x1,...,xn,y1,...,ym) and a function f that takes arrays of shape (y1,...,ym) to arrays of shape (z1,...,zp), you might want to apply f to the last m dimensions of a, for every setting of the first n dimensions of a, to get an array of shape (x1,...,xn,z1,...,zp). Or you may have a problem that can be transposed to one of this form. What is the best way to do transformations like these?)

hpaulj · Accepted Answer

My first thought is to reshape a, collapsing the first n dimensions down to one. Then it's just a matter of iterating on that dimension, applying f to each subarray. Collect the results in a list (or an array of the right size). Finally reshape.

As you describe it the x1...xn dimensions are just going-along-for-the-ride.

Look at the code for apply_along_axis. It iterates over all the axes except the one that is being passed to the function. It's not doing anything that you can't do just as well with your own iteration. It would handle the iteration over x1...xn, but require the y dimensions to be collapsed down to 1, and require a function that returns the same shape.

The core of that function is

res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
outarr[tuple(ind)] = res

where outarr has been initialed to the right size, and ind is stepped over all the dimensions (except one). It has a slice object where the res goes.

=====================

A simple example starting with a 2d input array:

In [933]: def foo(arr):
     ...:     return arr.reshape(2,-1)
     ...: 
In [934]: source=np.arange(12).reshape(3,4)
In [935]: dest=np.zeros((source.shape[0],2,2),source.dtype)
In [936]: for i,r in enumerate(source):
     ...:     dest[i,...] = foo(r)
     ...:     
In [937]: dest
Out[937]: 
array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]]])

So this iterates on the rows of the source, generates new arrays, and inserts them in the right place of the destination. To set up the destination I have to know what foo produces (dimensionwise).

The list append approach doesn't require that much knowledge:

In [938]: dest=[]
In [939]: for i,r in enumerate(source):
     ...:     dest.append(foo(r))    
In [940]: dest
Out[940]: 
[array([[0, 1],
        [2, 3]]), array([[4, 5],
        [6, 7]]), array([[ 8,  9],
        [10, 11]])]
In [941]: np.array(dest)
...

It comes down to the old question, 'how do I generate a new array from a function?'

How to do numpy apply_along_axis when function returns higher-dimensional array?

Answers (1)

Related Questions