nbren12
nbren12

Reputation: 684

How to parameterize Kubeflow Pipelines environment variables?

I am exploring Vertex AI Pipelines for running machine learning training jobs. The kubeflow pipeline docs are clear about how to parameterize the commands/arguments of a container.

Is it also possible to pass a input to an environmental variable or image name of a component? This swagger schema for a component suggests that this can be done, but this example fails:

implementation:
  container:
    image: {concat: ["us.gcr.io/vcm-ml/emulator", {inputValue: tag}]
    # command is a list of strings (command-line arguments). 
    # The YAML language has two syntaxes for lists and you can use either of them. 
    # Here we use the "flow syntax" - comma-separated strings inside square brackets.
    command: [
      python3, 
      # Path of the program inside the container
      /pipelines/component/src/program.py,
      --input1-path,
      {inputPath: input_1},
      --param1, 
      --output1-path, 
    ]
    env:
      NAME: {inputValue: env}
inputs:
- {name: tag, type: String}
- {name: env, type: String}
- {name: input_1, type: String, description: 'Data for input_1'}

Is passing an {inputValue} to container.env or container.tag supported. Alternatively, is it possible to add an environment variable or change the image name using the V2 python DSL.

Upvotes: 0

Views: 1437

Answers (1)

Ark-kun
Ark-kun

Reputation: 6811

Sorry for the confusion.

Unfortunately, the JsonSchema is wrong here (i.e. it differs from the implementation). Same with the image.

The env implementation uses static map. And image is static as well.

In Kubeflow Pipelines (v1) there is a chance you might be able to set environment varible to a dynamic value after you create component instance. But This probably won't work in Vertex Pipelines.

my_task = my_op(...)
my_task.container.add_env_variable(V1EnvVar(name='MSG', value=task1.outputs["out1"]))

If this does not work you can create a GitHub issue in the KFP repo regarding the env support.

For image we usually advice to have separate component files for different images.

A workaround would be a small wrapper script that sets the variable:

inputs:
- {name: tag, type: String}
- {name: env, type: String}
- {name: input_1, type: String, description: 'Data for input_1'}
implementation:
  container:
    image: "us.gcr.io/vcm-ml/emulator"
    command:
    - sh
    - -ec
    - 'NAME="$0" "$@"' # Set NAME to the first arg and execute the rest
    - {inputValue: env}
    - python3
      # Path of the program inside the container
    - /pipelines/component/src/program.py
    - --input1-path
    - {inputPath: input_1}
    - --param1 
    - --output1-path

Upvotes: 1

Related Questions