coderhk
coderhk

Reputation: 276

How do I implement the Kubeflow "Run Paramters" with the TFX SDK specialized for GCP?

I am currently using Kubeflow as my orchestrator. The orchestrator is actually an instance of an AI platform pipeline hosted on GCP. How do I create run-time parameters using the Tensorflow Extended SDK? I suspect that this is the class that I should use, however the documentation is not very meaningful nor does it provide any examples. https://www.tensorflow.org/tfx/api_docs/python/tfx/orchestration/data_types/RuntimeParameter

Something like the picture below. enter image description here

Upvotes: 2

Views: 738

Answers (1)

Daniek Brink
Daniek Brink

Reputation: 91

Say, for example, you want to add the module file location as a runtime parameter that is passed to the transform component in your TFX pipeline.

Start by setting up your setup_pipeline.py and defining the module file parameter:

# setup_pipeline.py

from typing import Text
from tfx.orchestration import data_types, pipeline
from tfx.orchestration.kubeflow import kubeflow_dag_runner
from tfx.components import Transform

_module_file_param = data_types.RuntimeParameter(
    name='module-file',
    default=
    '/tfx-src/tfx/examples/iris/iris_utils_native_keras.py',
    ptype=Text,
)

Next, define a function that specifies the components used in your pipeline and pass along the parameter.

def create_pipeline(..., module_file):
    # setup components:
    ...

    transform = Transform(
         ...
         module_file=module_file
      )
     ...

    components = [..., transform, ...]

    return pipeline.Pipeline(
          ...,
          components=components
    )

Finally, setup the Kubeflow DAG runner so that it passes the parameter along to the create_pipeline function. See here for a more complete example.

if __name__ == "__main__":

    # instantiate a kfp_runner
    ...

    kfp_runner = kubeflow_dag_runner.KubeflowDagRunner(
        ...
    )

    kfp_runner.run(
        create_pipeline(..., module_file=_module_file_param
      ))

Then you can run python -m setup_pipeline which will produce the yaml file that specifies the pipeline config, which you can then upload to the Kubeflow GCP interface.

Upvotes: 4

Related Questions