Nihilum
Nihilum

Reputation: 701

Parallelize function over 3D array chunked with Dask

I have a dataset ds (dimension mid_date,y,x) that hosts a 3D numpy array (ds.v). I have a function that I want to apply on each y,x cell of the array, which returns a vector reduced in dimension compared to the input:

    # Load input matrix    
    array_input = ds.v.values
    
    # Create a variable m <= array_input.shape[0]
    m = 10
    
    # Create function that inputs a vector size array_input.shape[0] and returns a vector size m
    func(pt_in, m):
      # reduce the dimensionality of the input
      pt_out = pt_in[:m]
      return pt_out
 
    # Apply the function to every y,x cell of the input_array, store the returns in an output array    
    array_output = np.zeros((m, ds.v.values[1], ds.v.values[2]))
    for i in range(ds.v.values.shape[1]):
        for j in range(ds.v.values.shape[2]):
               array_output[:,i,j] = func(array_input[:,i,j], m)

I have two objectives:

  1. Parallelize this function so it can iterate through the y, x dimensions of my input array, and store the results in an output array of size (m, x, y).
  2. Use Dask so I can apply this function on chunks rather than loading the entire dataset on memory.

I tried using dask_array.map_blocks but the issue is that my real function has a few more inputs (6 inputs in total), but just like this function only the 1st input changes: the cell being squeezed. I did not manage to pass the function with map_blocks because it takes 1 cell of 1 chunk as an argument, and not the whole chunk.

How could I achieve that ?

ps: the datacube I'm working with is hosted here: url = 'http://its-live-data.s3.amazonaws.com/datacubes/v02/N50W140/ITS_LIVE_vel_EPSG3413_G0120_X-3350000_Y350000.zarr' (beware it's big)

Upvotes: 0

Views: 181

Answers (1)

ThomasNicholas
ThomasNicholas

Reputation: 1382

You need xarray.apply_ufunc. It's a complicated, but extremely powerful function, that can map practically any behavior you want over xarray objects. If you read through the xarray tutorial documentation section on apply_ufunc you should see examples that are similar in structure to your problem (i.e. 2 input core dimensions and potentially one new dimension on the output).

Upvotes: 0

Related Questions