Reputation: 49
I want to scale up the spark cluster to make all the worker nodes up and running before I start my processing. The issue is because the autoscaling of worker nodes is not happening immediately on load and is leading to worker node crashes. The cluster has 32 nodes but is overloading only 4 nodes and crashing so what I am trying to do is write some lines of code in the start of the python notebook which will kick start the remaining nodes and have 24 nodes up and running and then do the actual data processing. Is this possible using code ? Please advise.
Upvotes: 2
Views: 1975
Reputation: 12788
In general, autoscale is for interactive workloads. I've rarely seen it provide benefits in jobs, though marketing makes a good job of selling it as a cost saving feature.
You can use Databricks jobs to create an automated cluster. When you run a job on a new automated cluster and terminates the cluster when the job is complete.
If you know when scaling up should happen better than auto scale then you can use this resize API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#resize
Upvotes: 2