rajeshnair
rajeshnair

Reputation: 1673

Where does Kubeflow pipeline look for packages in `packages_to_install`?

I am using Kubeflow Pipelines in Vertex AI to create my ML pipeline and has beeen able to use standard packaged in Kubeflow component using the below syntax

@component(
   # this component builds an xgboost classifier with xgboost
   packages_to_install=["google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
   base_image="python:3.9",
   output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
                     data_set_id: str,
                     training_view: str,
                     metrics: Output[Metrics],
                     model: Output[Model]
):

Now I need to add my custom python module in packages_to_install . Is there a way to do it? For this I need to understand where does KFP look for packages when installing those on top of base_image. I understand this can be achieved using a custom base_image where I build the base_image with my python module in it. But it seems like an overkill for me and would prefer to specify python module where applicable in the component specification Something like below

@component(
   # this component builds an xgboost classifier with xgboost
   packages_to_install=["my-custom-python-module","google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
   base_image="python:3.9",
   output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
                     data_set_id: str,
                     training_view: str,
                     metrics: Output[Metrics],
                     model: Output[Model]
):

Upvotes: 1

Views: 3217

Answers (3)

rajeshnair
rajeshnair

Reputation: 1673

I found the answer to this

With KFP SDK 1.8.12, Kubeflow allows you to specify custom pip_index_url See Kubeflow feature request

With this feature, I can install my custom python module like this

@component(
   # this component builds an xgboost classifier with xgboost
   pip_index_urls=[CUSTOM_ARTEFACT_REPO, "https://pypi.python.org/simple"],
   packages_to_install=["my-custom-python-module","google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
   base_image="python:3.9",
   output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
                     data_set_id: str,
                     training_view: str,
                     metrics: Output[Metrics],
                     model: Output[Model]
):

Upvotes: 1

chesu
chesu

Reputation: 66

For this I need to understand where does KFP look for packages when installing those on top of base_image.

What you specified in packages_to_install is passed to pip install command, so it looks for packages from PyPI. You can also install a package from a source control as pip supports it. See examples: https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-from-vcs

Upvotes: 1

IronPan
IronPan

Reputation: 31

Under the hood, the step will install the package at the runtime when executing the component. This requires a package to be hosted in a location that can be accessed by the runtime environment later.

Given that, you need to upload the package to a location that can be accessed later, e.g. git repository as Jose mentioned.

Upvotes: 1

Related Questions