Submitting a databricks notebook run specifying a cluster pool?

Question

I'm submitting a one time, throw away notebook job with:

azuredatabricks.net/api/2.0/jobs/runs/submit

$json = @"
{
    "run_name": "integration testing notebook task",
    "existing_cluster_id": "$global:clusterID",
    "timeout_seconds": 3600,
    "notebook_task": {
        "notebook_path": "$global:notebookPath"
    }
}
"@

However, rather than specify an existing cluster ID, (which I had to create myself initially) I want it to use a cluster from the existing pool. How is this possible? The schema doesn't seem to accept instance_pool_id for this request.

Alex Ott · Accepted Answer

You need to use create request with new_cluster instead, and inside its definition specify the instance_pool_id, the same way as for normal clusters. Something like this:

$json = @"
{
    "run_name": "integration testing notebook task",
    "new_cluster": : {
      "spark_version": "7.3.x-scala2.12",
      "node_type_id": "r3.xlarge",
      "aws_attributes": {
        "availability": "ON_DEMAND"
      },
      "num_workers": 10,
      "instance_pool_id": "$global:poolID"
    },
    "timeout_seconds": 3600,
    "notebook_task": {
        "notebook_path": "$global:notebookPath"
    }
}
"@

But this will create a cluster with machines from a pool, not attach to some cluster that is already allocated there.

Submitting a databricks notebook run specifying a cluster pool?

Answers (1)

Related Questions