Ales
Ales

Reputation: 505

Why is map_block function run twice?

I have a question why is map_block function run twice? When I run an example below:

 import dask.array as da
 import numpy as np

 def derivative(x):
     print(x.shape)
     return x - np.roll(x, 1)

 x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
 d = da.from_array(x, chunks = 5)
 y = d.map_blocks(derivative)
 res = y.compute()

I obtain this output:

 (1L,)
 (5L,)
 (4L,)

Since my chunks are ((5, 4),), I assume that derivative function has to be somehow run once before is really executed on these chunks, am I right?

I have python v2.7 and dask on v0.13.0.

Upvotes: 1

Views: 113

Answers (1)

MRocklin
MRocklin

Reputation: 57281

If you do not supply a dtype to the map-blocks call then it will try running your function on a tiny sample dataset (hence the singleton shape). You can avoid this by passing a dtype explicitly if you know it.

y = d.map_blocks(derivative, dtype=d.dtype)

Upvotes: 1

Related Questions