Reputation: 13
Is there anyway to store the logs stored by mlflow to AWS S3?
mlflow server \
--backend-store-uri /mnt/persistent-disk \
--default-artifact-root s3://my-mlflow-bucket/ \
--host 0.0.0.0
Is it possible to only provide the default-artifact-root instead of providing both backend-store-uri and default-artifact-root?
Also is there anyway to set default-artifact-root programatically from MlFlowClient or MlFlowContext instead of running mlflow server command line?
FYI, I have already defined all AWS_ACCESS_KEY and AWS_SECRET_KEY in my environment variables, and exported ENDPOINTS to S3.
Is logArtifacts from ActiveRun class a correct method to set the artifact_uri which points to AWS s3 bucket?
Upvotes: 1
Views: 7024
Reputation: 187
You can set tracking URI programmatically for clients, though, to log experiments to the server launched remotely. If you are using an SQLAlchemy-compatible database for store then both arguments are needed.
For example, on my localhost where I use sqlite://mlruns.db, I can launch the server as:
mlflow server --backend-store-uri sqlite:///mlruns.db --default-artifact-root ./mlruns
[2020-03-07 23:06:42 -0800] [3698] [INFO] Starting gunicorn 20.0.4
[2020-03-07 23:06:42 -0800] [3698] [INFO] Listening at: http://127.0.0.1:5000 (3698)
[2020-03-07 23:06:42 -0800] [3698] [INFO] Using worker: sync
[2020-03-07 23:06:42 -0800] [3701] [INFO] Booting worker with pid: 3701
[2020-03-07 23:06:42 -0800] [3702] [INFO] Booting worker with pid: 3702
[2020-03-07 23:06:42 -0800] [3703] [INFO] Booting worker with pid: 3703
[2020-03-07 23:06:42 -0800] [3704] [INFO] Booting worker with pid: 3704
As a side note, if for any reason you plan to run more than one tracking server, say if multiple data science teams are using MLflow, along with its respective backend-store-uri and default-artifact-root, perhaps use a shell script wrapper where you can read respective arguments from a config file.
my_script -f ds_team1.config
my_script -f ds_team2.config
The team.config will have respective credentials, port, arguments to the mlflow server.
Finally, mlflow.log_artifact() is what you want to log artifacts.
Upvotes: -4
Reputation: 1
I think it should work as you are doing it if you run this before: mlflow.set_tracking_uri("http://127.0.0.1:5000")
Upvotes: 0