Reputation: 3594
I manage my Workflows in Databricks using Databricks Asset Bundles - and I use job_clusters
. I'm trying to change a Workflow to use a Single Node cluster, but I cannot figure out the YAML to use, I keep getting errors. Here's the section of my YML containing the cluster configuration, which fails:
job_clusters:
- job_cluster_key: my_job_cluster
new_cluster:
spark_version: 14.3.x-scala2.12
azure_attributes:
first_on_demand: 1
availability: ON_DEMAND_AZURE
spot_bid_max_price: -1
node_type_id: Standard_D8ds_v5
spark_env_vars:
PYSPARK_PYTHON: /databricks/python3/bin/python3
enable_elastic_disk: true
data_security_mode: SINGLE_USER
runtime_engine: PHOTON
num_workers: 0
However, when it deploys - it fails - here's the printout I get:
2024-08-14T21:44:54.4105088Z
2024-08-14T21:44:54.4106253Z Error: cannot update job: NumWorkers could be 0 only for SingleNode clusters. See https://docs.databricks.com/clusters/single-node.html for more details
2024-08-14T21:44:54.4106542Z
2024-08-14T21:44:54.4106718Z with databricks_job.My_Workflow_Name,
2024-08-14T21:44:54.4106988Z on bundle.tf.json line 1160, in resource.databricks_job.My_Workflow_Name:
2024-08-14T21:44:54.4107196Z 1160: },
2024-08-14T21:44:54.4107277Z
2024-08-14T21:44:54.4107329Z
The configuration works if my num_workers is anything but 0.
If it helps, I have a personal compute cluster that's basically the configuration I need to use in my Workflow for comparison - using the JSON from the Databricks UI:
{
"cluster_name": "My Personal Compute Cluster",
"spark_version": "14.3.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*, 4]"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS4_v2",
"driver_node_type_id": "Standard_DS4_v2",
"custom_tags": {
"ResourceClass": "SingleNode"
},
"autotermination_minutes": 53,
"enable_elastic_disk": true,
"init_scripts": [
{
"workspace": {
"destination": "/Shared/init.sh"
}
}
],
"single_user_name": "[email protected]",
"policy_id": "id_goes_here",
"enable_local_disk_encryption": false,
"data_security_mode": "SINGLE_USER",
"runtime_engine": "STANDARD",
"num_workers": 0,
"apply_policy_default_values": false
}
Can anyone help direct what the YAML might need to be to have a cluster like the one in the JSON be used in my Workflow?
Upvotes: 2
Views: 784
Reputation: 1936
Single User is different from Single Node.
If you want num_workers to be 0, you want to run the cluster in 1 node (just the driver). Please include the cluster.profile : singleNode as given below. That should solve the problem for you.
new_cluster:
node_type_id: i3.xlarge
num_workers: 0
spark_version: 14.3.x-scala2.12
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"ResourceClass": "SingleNode"
Upvotes: 3