MLflow saving weights after each epoch

Question

I have been testing some small examples with MLflow tracking but for my usecase I would like to have the weights saved after each epoch. Sometimes I kill the runs before they are completely finished (I cannot use earlystopping), but what I experience now is that the weights do not get saved to the tracking ui server. Is there a way to do this after each epoch?

Raphael K · Accepted Answer

Save the weights to disk and then log them as an artifact. As long as the checkpoints/weights are saved to disk, you can log them with mlflow_log_artifact() or mlflow_log_artifacts(). From the docs,

mlflow.log_artifact() logs a local file or directory as an artifact, optionally taking an artifact_path to place it in within the run’s artifact URI. Run artifacts can be organized into directories, so you can place the artifact in a directory this way.

mlflow.log_artifacts() logs all the files in a given directory as artifacts, again taking an optional artifact_path.

MLflow saving weights after each epoch

Answers (1)

Related Questions