djgcp
djgcp

Reputation: 303

Get the Dataproc cluster name from within the PySpark code

From within a pyspark code running on a dataproc cluster, is it possible to get the dataproc cluster name where it is running?

Upvotes: 2

Views: 331

Answers (1)

Dagang Wei
Dagang Wei

Reputation: 26498

Dataproc cluster name is available as VM metadata attributes/dataproc-cluster-name. You can get it through

  1. CLI
/usr/share/google/get_metadata_value attributes/dataproc-cluster-name
  1. HTTP
curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/attributes/dataproc-cluster-name" 

For regular clusters (non personal-auth), you can also infer the cluster name from the VM host name, just remove the part after -m or -w.

Upvotes: 1

Related Questions