jmamath
jmamath

Reputation: 300

parallelize numpy arange on dask array

I would like to use dask to do the following operation; let say I have a numpy array:
In: x = np.arange(5)
Out: [0,1,2,3,4]
Then I want a function to map np.arange to all the elements of my array. I have already defined a function for that purpose:

def list_range(array, no_cell):
    return np.add.outer(array, np.arange(no_cell)).T

# e.g
In: list_range(x,3)
Out: array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6]])

Now I want to reproduce this in parallel using map_blocks on a dask array but I always get an error. Here is my attempt based on the dask documentation of map_blocks:

constant = 4
d = da.arange(5, chunks=(2,))
f = da.core.map_blocks(list_range, d, constant, chunks=(2,))
f.compute()

I get ValueError: could not broadcast input array from shape (4,2) into shape (4)

Upvotes: 2

Views: 338

Answers (1)

TavoloPerUno
TavoloPerUno

Reputation: 579

Have you checked out Dask's ufunc methods? For your problem, you can try,

da.add.outer(d, np.arange(constant)).T.compute()

While using map_blocks, you have to make sure that you specify the new dimensions when your operation results in a change in chunk dimensions. In your problem, the chunk dimension is no more (2,), and instead is (2,4). This new dimension should be specified using the new_axis parameter. Also, I found that map_blocks is not vstacking the blocks after map_blocks, and I couldn't get the transpose to work within the mapped function. Try this to make map_blocks work,

def list_range(array, no_cell):
    return np.add.outer(array, np.arange(no_cell))

constant = 4
d = da.arange(5, chunks=(2,))


f=da.core.map_blocks(list_range, d, constant, chunks=(2,constant), new_axis=[1])
f.T.compute()

Upvotes: 1

Related Questions