Reputation: 1673
I am using Kubeflow Pipelines in Vertex AI to create my ML pipeline and has beeen able to use standard packaged in Kubeflow component using the below syntax
@component(
# this component builds an xgboost classifier with xgboost
packages_to_install=["google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
base_image="python:3.9",
output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
data_set_id: str,
training_view: str,
metrics: Output[Metrics],
model: Output[Model]
):
Now I need to add my custom python module in packages_to_install
. Is there a way to do it? For this I need to understand where does KFP look for packages when installing those on top of base_image.
I understand this can be achieved using a custom base_image where I build the base_image with my python module in it. But it seems like an overkill for me and would prefer to specify python module where applicable in the component specification
Something like below
@component(
# this component builds an xgboost classifier with xgboost
packages_to_install=["my-custom-python-module","google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
base_image="python:3.9",
output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
data_set_id: str,
training_view: str,
metrics: Output[Metrics],
model: Output[Model]
):
Upvotes: 1
Views: 3217
Reputation: 1673
I found the answer to this
With KFP SDK 1.8.12, Kubeflow allows you to specify custom pip_index_url
See Kubeflow feature request
With this feature, I can install my custom python module like this
@component(
# this component builds an xgboost classifier with xgboost
pip_index_urls=[CUSTOM_ARTEFACT_REPO, "https://pypi.python.org/simple"],
packages_to_install=["my-custom-python-module","google-cloud-bigquery", "xgboost", "pandas", "sklearn", "joblib", "pyarrow"],
base_image="python:3.9",
output_component_file="output_component/create_xgb_model_xgboost.yaml"
)
def build_xgb_xgboost(project_id: str,
data_set_id: str,
training_view: str,
metrics: Output[Metrics],
model: Output[Model]
):
Upvotes: 1
Reputation: 66
For this I need to understand where does KFP look for packages when installing those on top of base_image.
What you specified in packages_to_install
is passed to pip install
command, so it looks for packages from PyPI.
You can also install a package from a source control as pip
supports it. See examples: https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-from-vcs
Upvotes: 1
Reputation: 31
Under the hood, the step will install the package at the runtime when executing the component. This requires a package to be hosted in a location that can be accessed by the runtime environment later.
Given that, you need to upload the package to a location that can be accessed later, e.g. git repository as Jose mentioned.
Upvotes: 1