Reputation: 1095
Is there any flag available to give custom job_id to dataproc jobs. I am using this command to run pig jobs.
gcloud dataproc jobs submit pig --cluster my_cluster --file my_queries.pig
I use similar commands to submit pyspark/hive jobs.
This command creates a job_id on its own and tracking them later on is difficult.
Upvotes: 0
Views: 3443
Reputation: 922
Reading the gcloud code you can see that the args called id is used as job name
therefore you only need to add the --id to you gcloud command
gcloud dataproc jobs submit spark --id this-is-my-job-name --cluster my-cluster --class com.myClass.Main --jars gs://my.jar
Upvotes: 5
Reputation: 10677
While it's possible to provide your own generated jobid when using the underlying REST API, there isn't currently any way to specify your own jobid when submitting with gcloud dataproc jobs submit
; this feature might be added in the future. That said, typically when people want to specify job ids they also want to be able to list with more complex match expressions, or potentially to have multiple categories of jobs listed by different kinds of expressions at different points in time.
So, you might want to consider dataproc labels instead; labels are intended specifically for this kind of use case, and are optimized for efficient lookup. For example:
gcloud dataproc jobs submit pig --labels jobtype=mylogspipeline,date=20170508 ...
gcloud dataproc jobs submit pig --labels jobtype=mylogspipeline,date=20170509 ...
gcloud dataproc jobs submit pig --labels jobtype=mlpipeline,date=20170509 ...
gcloud dataproc jobs list --filter "labels.jobtype=mylogspipeline"
gcloud dataproc jobs list --filter "labels.date=20170509"
gcloud dataproc jobs list --filter "labels.date=20170509 AND labels.jobtype=mlpipeline"
Upvotes: 1