Wong Songhan
Wong Songhan

Reputation: 11

How do i get the output from a custom container and pass to next pipeline in Vertex AI/Kubeflow pipeline?

I am having difficulty trying to understand how to pass a result from a container as an output artifact. I understand that we need to write the output to a file but i need some example how to do it.

https://www.kubeflow.org/docs/components/pipelines/sdk-v2/component-development/

This is the last part of the python container program where i save the url of model file on GCS onto output.txt.

with open('./output.txt', 'w') as f:
    logging.info(f"Model path url is in {'./output.txt'}")
    f.write(model_path)

This is the component .yaml file

name: Dummy Model Training
description: Train a dummy model and save to GCS
inputs:
  - name: input_url
    description: 'Input csv url.'
    type: String
  - name: gcs_url
    description: 'GCS bucket url.'
    type: String
outputs:
  - name: gcs_model_path
    description: 'Trained model path.'
    type: String
implementation:
    container:
        image: ${CONTAINER_REGISTRY}
        command: [
          python, ./app/trainer.py,
          --input_url, {inputValue: input_url},
          --gcs_url, {inputValue: gcs_url},
        ]

Upvotes: 1

Views: 1030

Answers (1)

chesu
chesu

Reputation: 66

First of all, your dummy component is missing reference to the output. You need to use {outputPath: <output_name>} or {outputUri: <output_name>} to pass it into the container, so that you container code can write data to this system generated path or URI ("gs://..."). To fix your component yaml, it can be:

name: Dummy Model Training
description: Train a dummy model and save to GCS
inputs:
  - name: input_url
    description: 'Input csv url.'
    type: String
  - name: gcs_url
    description: 'GCS bucket url.'
    type: String
outputs:
  - name: gcs_model_path
    description: 'Trained model path.'
    type: String
implementation:
    container:
        image: ${CONTAINER_REGISTRY}
        command: [
          python, ./app/trainer.py,
          --input_url, {inputValue: input_url},
          --gcs_url, {inputValue: gcs_url},
          --output_model_path, {outputPath: gcs_model_path}
        ]

Then your code should write to this passed-in path, instead of './output.txt'

Regarding how to consume the output in a downstream component. Here's a simple yet runnable example, which you can try out on Vertex Pipelines: https://github.com/kubeflow/pipelines/blob/bf2389a66c164457b0e10a820ba484992fd7dd1a/sdk/python/test_data/pipelines/two_step_pipeline.py

Upvotes: 0

Related Questions