Reputation: 51
We are trying to return some metrics from our Vertex Pipeline, such that they are visible in the Run Comparison and Metadata tools in the Vertex UI.
I saw here that we can use this output type Output[Metrics]
, and the subsequent metrics.log_metric("metric_name", metric_val)
method to add the metrics, and it seemed from the available documentation that this would be enough.
We want to use the reusable component method as opposed to python function based components, around which the example is based. So we implemented it within our component code like so:
We added the output in the component.yaml:
outputs:
- name: metrics
type: Metrics
description: evaluation metrics path
then added the output to the command in the implemenation:
command: [
python3, main.py,
--gcs-test-data-path, {inputValue: gcs_test_data_path},
--gcs-model-path, {inputValue: gcs_model_path},
--gcs-output-bucket-id, {inputValue: gcs_output_bucket_id},
--project-id, {inputValue: project_id},
--timestamp, {inputValue: timestamp},
--batch-size, {inputValue: batch_size},
--img-height, {inputValue: img_height},
--img-width, {inputValue: img_width},
--img-depth, {inputValue: img_depth},
--metrics, {outputPath: metrics},
]
Next in the components main python script, we parse this argument with argparse:
PARSER.add_argument('--metrics',
type=Metrics,
required=False,
help='evaluation metrics output')
and pass it to the components main function:
if __name__ == '__main__':
ARGS = PARSER.parse_args()
evaluation(gcs_test_data_path=ARGS.gcs_test_data_path,
gcs_model_path=ARGS.gcs_model_path,
gcs_output_bucket_id=ARGS.gcs_output_bucket_id,
project_id=ARGS.project_id,
timestamp=ARGS.timestamp,
batch_size=ARGS.batch_size,
img_height=ARGS.img_height,
img_width=ARGS.img_width,
img_depth=ARGS.img_depth,
metrics=ARGS.metrics,
)
in the declaration of the component function, we then typed this metrics parameter as Output[Metrics]
from kfp.v2.dsl import Output, Metrics
def evaluation(gcs_test_data_path: str,
gcs_model_path: str,
gcs_output_bucket_id: str,
metrics: Output[Metrics],
project_id: str,
timestamp: str,
batch_size: int,
img_height: int,
img_width: int,
img_depth: int):
finally, we implement the log_metric method within this evaluation function:
metrics.log_metric('accuracy', acc)
metrics.log_metric('precision', prec)
metrics.log_metric('recall', recall)
metrics.log_metric('f1-score', f_1)
When we run this pipeline, we can see this metric artifact materialised in the DAG:
And Metrics Artifacts are listed in the Metadata UI in Vertex:
However, clicking through to view the artifacts JSON, there is no Metadata listed:
In addition, No Metadata is visible when comparing runs in the pipeline UI:
Finally, navigating to the Objects URI in GCS, we are met with 'Requested entity was not found.', which I assume indicates that nothing was written to GCS:
Are we doing something wrong with this implementation of metrics in the reusable components? From what I can tell, this all seems right to me, but it's hard to tell given the docs at this point seem to focus primarily on examples with Python Function based components.
Do we perhaps need to proactively write this Metrics object to an OutputPath?
Any helps is appreciated.
----- UPDATE ----
I have since been able to get artifact metadata and URI To update. In the end we used kfp sdk to generate a yaml file based on a @component decorated python function, we then adapted this format for our reusable components. Our component.yaml now looks like this:
name: predict
description: Prepare and create predictions request
implementation:
container:
args:
- --executor_input
- executorInput: null
- --function_to_execute
- predict
command:
- python3
- -m
- kfp.v2.components.executor_main
- --component_module_path
- predict.py
image: gcr.io/PROJECT_ID/kfp/components/predict:latest
inputs:
- name: input_1
type: String
- name: intput_2
type: String
outputs:
- name: output_1
type: Dataset
- name: output_2
type: Dataset
with this change to the yaml, we can now successfully update the artifacts metadata dictionary, and uri through artifact.path = '/path/to/file'
. These updates are displayed in the Vertex UI.
I am still unsure why the component.yaml format specified in the Kubeflow documentation does not work - I think this may be a bug with Vertex Pipelines.
Upvotes: 0
Views: 1098
Reputation: 761
As I can see in the code you are running, everything should work without a problem; but, as you commented, I would recommend you to write the metrics object into a path so that it can reach somewhere within your project.
Upvotes: 1