ffgg
ffgg

Reputation: 174

Dask apply_along_axis error, comparison with Numpy

I am trying to apply a function to a Dask array, using apply_along_axis, and while the same function works on a numpy array, it does not work on a Dask array. Here is an example:

import dask.array as da

w = numpy.array([[6,7,8],[9,10,11]])
q = numpy.array([[1,2,3],[4,5,6]])
s = numpy.stack([w,q])
def func(arr):
    t, y = arr[0], arr[1]
    return arr[0] + arr[1]

s_dask = da.from_array(s)

Running func on the numpy array works as expected, whereas running it on a Dask array throws an error: IndexError: index 1 is out of bounds for axis 0 with size 1"

>>>s
array([[[ 6,  7,  8],
        [ 9, 10, 11]],

       [[ 1,  2,  3],
        [ 4,  5,  6]]])

>>>numpy.apply_along_axis(func,0,s)
array([[ 7,  9, 11],
       [13, 15, 17]])

>>>da.apply_along_axis(func,0,s_dask)
Traceback (most recent call last):
  File "<pyshell#151>", line 1, in <module>
    da.apply_along_axis(func,0,s_dask)
  File "..Python37\lib\site-packages\dask\array\routines.py", line 383, in apply_along_axis
    test_result = np.array(func1d(test_data, *args, **kwargs))
  File "<pyshell#149>", line 2, in func
    t, y = a[0],a[1]
IndexError: index 1 is out of bounds for axis 0 with size 1

I am not sure what I am doing wrong here

Upvotes: 2

Views: 268

Answers (1)

MRocklin
MRocklin

Reputation: 57271

Dask array is trying to figure out what the dtype of the output array. To do this it sends a small empty array through your function. That small empty array is failing because your function assumes that the input is at least of size two.

You can save Dask the trouble by providing the dtype explicitly.

da.apply_along_axis(func, 0, s_dask, dtype=s_dask.dtype)

Upvotes: 3

Related Questions