Reputation: 488
Is it feasible to "daskify" a pointwise function written in terms of general numpy operations ?
Case example + Partial Solution:
For example, see here : https://github.com/SciTools/iris/pull/2964
The point of that is that we want to apply a generalised array operation from another library, but it can only operate on actual numpy arrays.
Whereas we want it to operate on an existing dask array in this operation, and produce a lazy result for which sub-arrays can be efficiently computed.
Which is why it is using da.from_array
...
Alternatives:
You could instead use deferred, but if you do it must evaluate the whole of the argument every time, even if the result is sub-indexed.
Or you could use frompyfunc
http://dask.pydata.org/en/latest/array-api.html#dask.array.frompyfunc
But that uses a scalar function, not an array function.
Which is inefficient, especially as it returns an array of objects rather than numbers.
Remaining Problem:
In the above partial solution, the missing piece is the ability to "see through" the opaque point-calculation wrapper, so its dask arguments are visible to the whole graph.
? Perhaps there is a way in Dask to expose the dask_array
argument which is currently hidden in this from_array(ArraylikeWrapper(dask_array))
construction ?
Upvotes: 0
Views: 94
Reputation: 57271
Have you tried da.map_blocks?
x = x.map_blocks(func)
Dask also supports NumPy ufuncs with the __array_ufunc__
protocol if you're able to create those (though map_blocks
is likely easier).
Upvotes: 1