Reputation: 21
I have set-up datalab to run on a dataproc master node using the datalab initialisation action:
gcloud dataproc clusters create <CLUSTER_NAME> \
--initialization-actions gs://<GCS_BUCKET>/datalab/datalab.sh \
--scopes cloud-platform
This historically has worked OK. However as of 30.5 I can no longer get any code to run, however simple. I just get the "Running" progress bar. No timeouts, no error messages. How can I debug this?
Upvotes: 2
Views: 262
Reputation: 1349
I just created a cluster and it seemed to work for me.
Just seeing "Running" usually means that there is not enough room in the cluster to schedule a Spark Application. Datalab loads PySpark when Python loads and that creates a YARN application. Any code will block until the YARN application is scheduled.
On the default 2 node n1-standard-4 worker cluster, with the default configs. There can only be 1 spark application. You should be able to fit two notebooks by setting --properties spark.yarn.am.memory=1g
or using a larger cluster, but you will still eventually hit a limit on running notebooks per cluster.
Upvotes: 3