mirekphd
mirekphd

Reputation: 6781

How to add more metrics to a finished MLflow run?

Once an MLflow run is finished, external scripts can access its parameters and metrics using python mlflow client and mlflow.get_run(run_id) method, but the Run object returned by get_run seems to be read-only.

Specifically, .log_param .log_metric, or .log_artifact cannot be used on the object returned by get_run, raising errors like these:

AttributeError: 'Run' object has no attribute 'log_param'

If we attempt to run any of the .log_* methods on mlflow, it would log them into to a new run with auto-generated run ID in the Default experiment.

Example:

final_model_mlflow_run = mlflow.get_run(final_model_mlflow_run_id)

with mlflow.ActiveRun(run=final_model_mlflow_run) as myrun:    
    
    # this read operation uses correct run
    run_id = myrun.info.run_id
    print(run_id)
    
    # this write operation writes to a new run 
    # (with auto-generated random run ID) 
    # in the "Default" experiment (with exp. ID of 0)
    mlflow.log_param("test3", "This is a test")
   

Note that the above problem exists regardless of the Run status (.info.status can be both "FINISHED" or "RUNNING", without making any difference).

I wonder if this read-only behavior is by design (given that immutable modeling runs improve experiments reproducibility)? I can appreciate that, but it also goes against code modularity if everything has to be done within a single monolith like the with mlflow.start_run() context...

Upvotes: 7

Views: 6244

Answers (2)

John Robert
John Robert

Reputation: 21

This question can be divided into two

  • modifying an existing param/metric/log in previous run.

For instance:

old: mlflow.log_metric("accuracy", 0.85)

new: mlflow.log_metric("accuracy", 0.90)
  • adding new param/metric/log to previous run

For instance:

old: mlflow.log_metric("accuracy", 0.85)

new: mlflow.log_metric("dropout", 0.05)

In the previous version of mlflow v1.13 - v1.19, you can do both, that is modify params in previous run, and add new params to previous ran. You need need the run_id for that previous run to do that

# initialize mlflow client 
mlflow.set_tracking_uri(track_uri)
mlflow.set_experiment(experiment_name) 


# get run_id with run name 
run_object = mlflow.search_runs(filter_string=f"attributes.run_name = '{run_name}'")
                
run_id = run_object["run_id"]

# start previous run with run_id
with mlflow.start_run(run_id=run_id):
    
# print(mlflow.active_run().info)
    
    # add new param
    mlflow.log_param("accuracy", 0.90)
    mlflow.log_param("droupout", 0.05)
  

This code work for both cases if you have between v1.13 - v1.19 version of mlflow. mlflow.log_param("accuracy", 0.90) will fail if your version is more recent than v.1.19. You will get the error "mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Changing param values is not allowed." This github issue explains why the feature was depreciated.

Adding new params mlflow.log_param("dropout", 0.05) to previous run will work for all versions.

Upvotes: 0

mirekphd
mirekphd

Reputation: 6781

As it was pointed out to me by Hans Bambel and as it is documented here mlflow.start_run (in contrast to mlflow.ActiveRun) accepts the run_id parameter of an existing run.

Here's an example tested to work in v1.13 through v1.19 - as you see one can even overwrite an existing metric to correct a mistake:

with mlflow.start_run(run_id=final_model_mlflow_run_id):
    
    # print(mlflow.active_run().info)
    
    mlflow.log_param("start_run_test", "This is a test")
    mlflow.log_metric("start_run_test", 1.23)
    mlflow.log_metric("start_run_test", 1.33)
    mlflow.log_artifact("/home/jovyan/_tmp/formula-features-20201103.json", "start_run_test")

Upvotes: 7

Related Questions