Reputation: 465
I want to get flags used when creating a dataproc cluster in a spark job.
for example, I created my cluster using this command line:
gcloud dataproc clusters create cluster-name \
--region=region \
--bucket=bucket-name \
--temp-bucket=bucket-name \
other args ...
In my scala spark job I want to get the bucket name and other arguments how to do that, I know if I want to get the arguments of my job I must do that:
val sc = sparkSession.sparkContext
val conf_context=sc.getConf.getAll
conf_context.foreach(println)
Any help, please?
Thanks
Upvotes: 2
Views: 333
Reputation: 32660
You can use gcloud dataproc clusters describe shell command to get details about the cluster:
gcloud dataproc clusters describe $clusterName --region $clusterRegion
To get the bucket name from this command, you can use grep
:
BUCKET_NAME=$(gcloud dataproc clusters describe $clusterName \
--region $clusterRegion \
| grep 'configBucket:' \
| sed 's/.* //')
You should be able to execute this from Scala, see this post for how to do.
Upvotes: 1
Reputation: 608
Dataproc also publishes some attributes, including the bucket name, to GCE instance Metadata. You can also specify your own Metadata. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/metadata.
These will be available to you through the Metadata server. For example, if you want to read the bucket name, you can run
curl -s -H Metadata-Flavor:Google http://metadata.google.internal/computeMetadata/v1/instance/attributes/dataproc-bucket
Upvotes: 2