Reputation: 3961
Is it possible to have a multi-node Dask cluster be the compute for a PythonScriptStep
with AML Pipelines?
We have a PythonScriptStep
that uses featuretools
's, deep feature synthesis (dfs
) (docs). ft.dfs()
has a param, n_jobs
which allows for parallelization. When we run on a single machine, the job takes three hours, and runs much faster on a Dask. How can I operationalize this within an Azure ML pipeline?
Upvotes: 4
Views: 539
Reputation: 878
We've been working and recently released a dask_cloudprovider.AzureMLCluster
that might be of interest to you: link to repo. You can install it via pip install dask-cloudprovider
.
The AzureMLCluster
instantiates Dask cluster on AzureML service with elasticity of scaling up to 100s of nodes should you require that. The only required parameter is the Workspace
object, but you can pass your own ComputeTarget
should you choose to.
An example of how to use it you can found here. In this example I use my custom GPU/RAPIDS docker image but you can use any images within the Environment
class.
Upvotes: 6