Reputation: 52
We want to enable vertical autoscaling on our dataflow prime pipeline for a python container: https://cloud.google.com/dataflow/docs/vertical-autoscaling
We're trying to run our pipeline through this command:
gcloud dataflow jobs run process-data --additional-experiments=enable_prime --additional_experiments=enable_batch_vmr --additional_experiments=enable_batch_vmr ...
However, when we run this our pipeline is not a dataflow prime pipeline (we see this because the cost panel is enabled which it shouldn't if it was dataflow prime).
We see that all our experiment flags are fused together into one string and do not enable autoscaling:
['enable_prime,enable_batch_vmr,enable_vertical_memory_autoscaling', 'beam_fn_api', 'use_unified_worker', 'use_runner_v2', 'use_portable_job_submission', 'use_multiple_sdk_containers']
Has anyone been able to someone to run the command with additional experiments.
Upvotes: 0
Views: 30
Reputation: 26
Actually, if we look closely at the doc that @jggp1094 linked
We can notice that these flags are defined to be sent under the '--experiments' flag (NOT --additional-experiments) These flags are passed to the PipelineOptions as part of the beam args, and should be parsed correctly in this manner .
Upvotes: 1
Reputation: 180
I think it’s how you're specifying the --additional-experiments
flag. The gcloud command interprets multiple instances of this flag as a single, comma-separated string, not as distinct flags. This is why all your experiments are concatenated into a single string within the list you observed.
You need to provide the experiments as separate comma-separated values within one --additional-experiments
flag. The enable_batch_vmr is for batch jobs while for streaming jobs (which I assume you're using given the context of vertical autoscaling), this is unnecessary and might be conflicting.
Try using this command:
gcloud dataflow jobs run process-data --additional-experiments=enable_prime,enable_vertical_memory_autoscaling
This single --additional-experiments
flag contains both enable_prime
and enable_vertical_memory_autoscaling
as comma-separated values. This should correctly enable Dataflow Prime and vertical autoscaling.
Upvotes: 0