Chaouki
Chaouki

Reputation: 465

How to get gcloud dataproc create flag in a spark job?

I want to get flags used when creating a dataproc cluster in a spark job.

for example, I created my cluster using this command line:

gcloud dataproc clusters create cluster-name \
--region=region \
--bucket=bucket-name \
--temp-bucket=bucket-name \
other args ...

In my scala spark job I want to get the bucket name and other arguments how to do that, I know if I want to get the arguments of my job I must do that:

val sc = sparkSession.sparkContext
val conf_context=sc.getConf.getAll
conf_context.foreach(println)

Any help, please?

Thanks

Upvotes: 2

Views: 333

Answers (2)

blackbishop
blackbishop

Reputation: 32660

You can use gcloud dataproc clusters describe shell command to get details about the cluster:

gcloud dataproc clusters describe $clusterName --region $clusterRegion

To get the bucket name from this command, you can use grep:

BUCKET_NAME=$(gcloud dataproc clusters describe $clusterName \
--region $clusterRegion \
| grep 'configBucket:' \
| sed 's/.* //')

You should be able to execute this from Scala, see this post for how to do.

Upvotes: 1

cyxxy
cyxxy

Reputation: 608

Dataproc also publishes some attributes, including the bucket name, to GCE instance Metadata. You can also specify your own Metadata. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/metadata.

These will be available to you through the Metadata server. For example, if you want to read the bucket name, you can run

curl -s -H Metadata-Flavor:Google http://metadata.google.internal/computeMetadata/v1/instance/attributes/dataproc-bucket

Upvotes: 2

Related Questions