Reputation: 193
I am currently unable to spin up any clusters in our databricks AWS environment.
When I attempt to start up an on-demand cluster, it remains in "pending" for 20+ minutes (on relatively small clusters which usually take 2-3 min to start up).
Similarly, all of my scheduled jobs are failing due to their job clusters not being able to start either. This is a sample error message:
Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster [cluster_name] to be ready. Cause Cluster [cluster_name] is unusable since the driver is unhealthy.
When I try to investigate the issue, the driver logs are completely empty. I have tried to initiate clusters with runtimes 9.1 and 10.4 and see the same issue.
Has anyone seen this before? Is this a databricks issue or an AWS issue?
Upvotes: 2
Views: 1527
Reputation: 322
This is a pretty vague error message so there are 2 good options I use for troubleshooting that work most times
Upvotes: 0
Reputation: 6812
Has anyone seen this before? Is this a databricks issue or an AWS issue?
Yes I have seen this before. In almost all cases it was a cloud provider problem which resolved itself within a few hours. I have also seen this after a networking change where a new VPC was set up. Unless your networking has changed, and if the problem still persists I would register a support ticket with databricks.
Upvotes: 1