Reputation: 1492
I use Google dataflow job. I have following command to deploy job
java -jar build/libs/my-job-1.0-all.jar \
--project=$PROJECT \
--region=us-central1 \
--subscription=$SUBSCRIPTION \
--jobName=my-job \
--runner=DataflowRunner \
--streaming=true \
--stableUniqueNames=ERROR \
--workerMachineType=n1-standard-2 \
--usePublicIps=false \
--network=default \
--update
This works fine when I have running job in GCP but it would fail if there is nothing to update. If I remove update flag it works fine as long as there is no running job.
Is there a way to specify such that if job exists update it, and not just start a new job?
Upvotes: 1
Views: 1186
Reputation: 3883
To update your job, you'll need to launch a new job to replace the ongoing one. When you launch your replacement job, you have to set the following pipeline options to perform the update process in addition to the job's regular options:
--update
option--jobName
option in PipelineOptions to the same name as the job you want to updateIf any transform names in your pipeline have changed, you must supply a transform mapping and pass it using the --transformNameMapping
option.
Please not that the --update
flag makes sure that the in-flight data is not lost.
For updating the job, there is a direct way to update specified under "Launching your replacement job" section. Additionally, be aware that:
Currently, updating batch pipelines is not supported.
One important thing, everytime you want to update the job, it must exists before running the update. If you want to check if there is an ongoing job, you can prepare a script with querying a list of existing jobs by using the Dataflow Command-line Interface: gcloud beta dataflow jobs list
and launch the job if it doesn't exist, else do update.
Upvotes: 1