NateH06
NateH06

Reputation: 3594

Deploying a Single Node Cluster to a Databricks Workflow using Asset Bundles

I manage my Workflows in Databricks using Databricks Asset Bundles - and I use job_clusters. I'm trying to change a Workflow to use a Single Node cluster, but I cannot figure out the YAML to use, I keep getting errors. Here's the section of my YML containing the cluster configuration, which fails:

  job_clusters:
    - job_cluster_key: my_job_cluster
      new_cluster:
        spark_version: 14.3.x-scala2.12
        azure_attributes:
          first_on_demand: 1
          availability: ON_DEMAND_AZURE
          spot_bid_max_price: -1
        node_type_id: Standard_D8ds_v5
        spark_env_vars:
          PYSPARK_PYTHON: /databricks/python3/bin/python3
        enable_elastic_disk: true
        data_security_mode: SINGLE_USER
        runtime_engine: PHOTON
        num_workers: 0

However, when it deploys - it fails - here's the printout I get:

2024-08-14T21:44:54.4105088Z 
2024-08-14T21:44:54.4106253Z Error: cannot update job: NumWorkers could be 0 only for SingleNode clusters. See https://docs.databricks.com/clusters/single-node.html for more details
2024-08-14T21:44:54.4106542Z 
2024-08-14T21:44:54.4106718Z   with databricks_job.My_Workflow_Name,
2024-08-14T21:44:54.4106988Z   on bundle.tf.json line 1160, in resource.databricks_job.My_Workflow_Name:
2024-08-14T21:44:54.4107196Z 1160:       },
2024-08-14T21:44:54.4107277Z 
2024-08-14T21:44:54.4107329Z 

The configuration works if my num_workers is anything but 0.

If it helps, I have a personal compute cluster that's basically the configuration I need to use in my Workflow for comparison - using the JSON from the Databricks UI:

{
    "cluster_name": "My Personal Compute Cluster",
    "spark_version": "14.3.x-scala2.12",
    "spark_conf": {
        "spark.databricks.cluster.profile": "singleNode",
        "spark.master": "local[*, 4]"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS4_v2",
    "driver_node_type_id": "Standard_DS4_v2",
    "custom_tags": {
        "ResourceClass": "SingleNode"
    },
    "autotermination_minutes": 53,
    "enable_elastic_disk": true,
    "init_scripts": [
        {
            "workspace": {
                "destination": "/Shared/init.sh"
            }
        }
    ],
    "single_user_name": "[email protected]",
    "policy_id": "id_goes_here",
    "enable_local_disk_encryption": false,
    "data_security_mode": "SINGLE_USER",
    "runtime_engine": "STANDARD",
    "num_workers": 0,
    "apply_policy_default_values": false
}

Can anyone help direct what the YAML might need to be to have a cluster like the one in the JSON be used in my Workflow?

Upvotes: 2

Views: 784

Answers (1)

Ganesh Chandrasekaran
Ganesh Chandrasekaran

Reputation: 1936

Single User is different from Single Node.

If you want num_workers to be 0, you want to run the cluster in 1 node (just the driver). Please include the cluster.profile : singleNode as given below. That should solve the problem for you.

          new_cluster:
            node_type_id: i3.xlarge
            num_workers: 0
            spark_version: 14.3.x-scala2.12
            spark_conf:
                "spark.databricks.cluster.profile": "singleNode"
                "spark.master": "local[*, 4]"
            custom_tags:
                "ResourceClass": "SingleNode"

Upvotes: 3

Related Questions