Jonas Boecquaert
Jonas Boecquaert

Reputation: 51

How to export metrics from a containerized component in kubeflow pipelines 0.2.5

I have a pipeline made up out of 3 containerized components. In the last component I write the metrics I want to a file named /mlpipeline-metrics.json, just like it's explained here. This is the Python code I used.

metrics = {
    'metrics': [
        {
            'name': 'accuracy',
            'numberValue':  accuracy,
            'format': 'PERCENTAGE',
        },
        {
            'name': 'average-f1-score',
            'numberValue': average_f1_score,
            'format': 'PERCENTAGE'
        },
    ]
}

with open('/mlpipeline-metrics.json', 'w') as f:
    json.dump(metrics, f)

I also tried writing the file with the following code, just like in the example linked above.

with file_io.FileIO('/mlpipeline-metrics.json', 'w') as f:
    json.dump(metrics, f)

The pipeline runs just fine without any errors. But it won't show the metrics in the front-end UI.

I'm thinking it has something to do with the following codeblock.

def metric_op(accuracy, f1_scores):
    return dsl.ContainerOp(
        name='visualize_metrics',
        image='gcr.io/mgcp-1190085-asml-lpd-dev/kfp/jonas/container_tests/image_metric_comp',
        arguments=[
            '--accuracy', accuracy,
            '--f1_scores', f1_scores,
        ]
    )

This is the code I use to create a ContainerOp from the containerized component. Notice I have not specified any file_outputs. In other ContainerOp I have to specify file_outputs to be able to pass variables to the next steps in the pipeline. Should I do something similar here to map the /mlpipeline-metrics.json onto something so that kubeflow pipelines detects it?

I'm using a managed AI platform pipelines deployment running Kubeflow Pipelines 0.2.5 with Python 3.6.8.

Any help is appreciated.

Upvotes: 2

Views: 964

Answers (2)

Franco
Franco

Reputation: 868

Here is another way of showing metrics when you write python functions based method:

# Define your components code as standalone python functions:======================
    def add(a: float, b: float) -> NamedTuple(
        'AddOutput',
        [
            ('sum', float),
            ('mlpipeline_metrics', 'Metrics')
        ]
    ):
        '''Calculates sum of two arguments'''
        sum = a+b

        metrics = {
            'add_metrics': [
                {
                    'name': 'sum',
                    'numberValue': float(sum),
                }
            ]
        }
        print("Add Result: ", sum) # this will print it online in the 'main-logs' of each task


        from collections import namedtuple
        addOutput = namedtuple(
            'AddOutput',
        ['sum', 'mlpipeline_metrics'])
        return addOutput(sum, metrics)  # the metrics will be uploaded to the cloud

Note: I am jsut using a basci function here. I am not using your function.

Upvotes: 0

Jonas Boecquaert
Jonas Boecquaert

Reputation: 51

So after some trial and error I finally came to a solution. And I'm happy to say that my intuition was right. It did have something to do with the file_outputs I didn't specify. To be able to export your metrics you will have to set file_outputs as follows.

def metric_op(accuracy, f1_scores):
    return dsl.ContainerOp(
        name='visualize_metrics',
        image='gcr.io/mgcp-1190085-asml-lpd-dev/kfp/jonas/container_tests/image_metric_comp',
        arguments=[
            '--accuracy', accuracy,
            '--f1_scores', f1_scores,
        ],
        file_outputs={
            'mlpipeline-metrics': '/mlpipeline-metrics.json'
        }
    )

Upvotes: 2

Related Questions