Reputation: 300
I would like to use dask to do the following operation; let say I have a numpy array:
In: x = np.arange(5)
Out: [0,1,2,3,4]
Then I want a function to map np.arange
to all the elements of my array.
I have already defined a function for that purpose:
def list_range(array, no_cell):
return np.add.outer(array, np.arange(no_cell)).T
# e.g
In: list_range(x,3)
Out: array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
Now I want to reproduce this in parallel using map_blocks
on a dask array but I always get an error. Here is my attempt based on the dask documentation of map_blocks:
constant = 4
d = da.arange(5, chunks=(2,))
f = da.core.map_blocks(list_range, d, constant, chunks=(2,))
f.compute()
I get
ValueError: could not broadcast input array from shape (4,2) into shape (4)
Upvotes: 2
Views: 338
Reputation: 579
Have you checked out Dask's ufunc methods? For your problem, you can try,
da.add.outer(d, np.arange(constant)).T.compute()
While using map_blocks
, you have to make sure that you specify the new dimensions when your operation results in a change in chunk dimensions. In your problem, the chunk dimension is no more (2,), and instead is (2,4). This new dimension should be specified using the new_axis
parameter. Also, I found that map_blocks is not vstacking the blocks after map_blocks, and I couldn't get the transpose to work within the mapped function. Try this to make map_blocks work,
def list_range(array, no_cell):
return np.add.outer(array, np.arange(no_cell))
constant = 4
d = da.arange(5, chunks=(2,))
f=da.core.map_blocks(list_range, d, constant, chunks=(2,constant), new_axis=[1])
f.T.compute()
Upvotes: 1