firstranker14
firstranker14

Reputation: 13

mlflow artifact storage to AWS s3 artifacts

Is there anyway to store the logs stored by mlflow to AWS S3?

mlflow server \
    --backend-store-uri /mnt/persistent-disk \
    --default-artifact-root s3://my-mlflow-bucket/ \
    --host 0.0.0.0

Is it possible to only provide the default-artifact-root instead of providing both backend-store-uri and default-artifact-root?

Also is there anyway to set default-artifact-root programatically from MlFlowClient or MlFlowContext instead of running mlflow server command line?

FYI, I have already defined all AWS_ACCESS_KEY and AWS_SECRET_KEY in my environment variables, and exported ENDPOINTS to S3.

Is logArtifacts from ActiveRun class a correct method to set the artifact_uri which points to AWS s3 bucket?

Upvotes: 1

Views: 7024

Answers (2)

Jules Damji
Jules Damji

Reputation: 187

You can set tracking URI programmatically for clients, though, to log experiments to the server launched remotely. If you are using an SQLAlchemy-compatible database for store then both arguments are needed.

For example, on my localhost where I use sqlite://mlruns.db, I can launch the server as:

mlflow server --backend-store-uri sqlite:///mlruns.db --default-artifact-root ./mlruns 

[2020-03-07 23:06:42 -0800] [3698] [INFO] Starting gunicorn 20.0.4
[2020-03-07 23:06:42 -0800] [3698] [INFO] Listening at: http://127.0.0.1:5000 (3698)
[2020-03-07 23:06:42 -0800] [3698] [INFO] Using worker: sync
[2020-03-07 23:06:42 -0800] [3701] [INFO] Booting worker with pid: 3701
[2020-03-07 23:06:42 -0800] [3702] [INFO] Booting worker with pid: 3702
[2020-03-07 23:06:42 -0800] [3703] [INFO] Booting worker with pid: 3703
[2020-03-07 23:06:42 -0800] [3704] [INFO] Booting worker with pid: 3704

As a side note, if for any reason you plan to run more than one tracking server, say if multiple data science teams are using MLflow, along with its respective backend-store-uri and default-artifact-root, perhaps use a shell script wrapper where you can read respective arguments from a config file.

my_script -f ds_team1.config
my_script -f ds_team2.config

The team.config will have respective credentials, port, arguments to the mlflow server.

Finally, mlflow.log_artifact() is what you want to log artifacts.

enter image description here

Upvotes: -4

JRV
JRV

Reputation: 1

I think it should work as you are doing it if you run this before: mlflow.set_tracking_uri("http://127.0.0.1:5000")

Upvotes: 0

Related Questions