unitrium
unitrium

Reputation: 52

Vertical autoscaling dataflow experiments args don't get properly parsed

We want to enable vertical autoscaling on our dataflow prime pipeline for a python container: https://cloud.google.com/dataflow/docs/vertical-autoscaling

We're trying to run our pipeline through this command:

gcloud dataflow jobs run process-data --additional-experiments=enable_prime --additional_experiments=enable_batch_vmr --additional_experiments=enable_batch_vmr ...

However, when we run this our pipeline is not a dataflow prime pipeline (we see this because the cost panel is enabled which it shouldn't if it was dataflow prime).

We see that all our experiment flags are fused together into one string and do not enable autoscaling: ['enable_prime,enable_batch_vmr,enable_vertical_memory_autoscaling', 'beam_fn_api', 'use_unified_worker', 'use_runner_v2', 'use_portable_job_submission', 'use_multiple_sdk_containers']

dataflow capture of the experiments

Has anyone been able to someone to run the command with additional experiments.

Upvotes: 0

Views: 30

Answers (2)

Stav Nahum
Stav Nahum

Reputation: 26

Actually, if we look closely at the doc that @jggp1094 linked

We can notice that these flags are defined to be sent under the '--experiments' flag (NOT --additional-experiments) These flags are passed to the PipelineOptions as part of the beam args, and should be parsed correctly in this manner .

doc screenshot

Upvotes: 1

jggp1094
jggp1094

Reputation: 180

I think it’s how you're specifying the --additional-experiments flag. The gcloud command interprets multiple instances of this flag as a single, comma-separated string, not as distinct flags. This is why all your experiments are concatenated into a single string within the list you observed.

You need to provide the experiments as separate comma-separated values within one --additional-experiments flag. The enable_batch_vmr is for batch jobs while for streaming jobs (which I assume you're using given the context of vertical autoscaling), this is unnecessary and might be conflicting.

Try using this command:

gcloud dataflow jobs run process-data --additional-experiments=enable_prime,enable_vertical_memory_autoscaling

This single --additional-experiments flag contains both enable_prime and enable_vertical_memory_autoscaling as comma-separated values. This should correctly enable Dataflow Prime and vertical autoscaling.

Upvotes: 0

Related Questions