Reputation: 505
I have a question why is map_block
function run twice? When I run an example below:
import dask.array as da
import numpy as np
def derivative(x):
print(x.shape)
return x - np.roll(x, 1)
x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
d = da.from_array(x, chunks = 5)
y = d.map_blocks(derivative)
res = y.compute()
I obtain this output:
(1L,)
(5L,)
(4L,)
Since my chunks are ((5, 4),), I assume that derivative
function has to be somehow run once before is really executed on these chunks, am I right?
I have python v2.7 and dask
on v0.13.0.
Upvotes: 1
Views: 113
Reputation: 57281
If you do not supply a dtype to the map-blocks call then it will try running your function on a tiny sample dataset (hence the singleton shape). You can avoid this by passing a dtype explicitly if you know it.
y = d.map_blocks(derivative, dtype=d.dtype)
Upvotes: 1