Reputation: 2351
I am trying to setup a MLFlow tracking server on a remote machine as a systemd service. I have a sftp server running and created a SSH key pair.
Everything seems to work fine except the artifact logging. MLFlow seems to not have permissions to list the artifacts saved in the mlruns
directory.
I create an experiment and log artifacts in this way:
uri = 'http://192.XXX:8000'
mlflow.set_tracking_uri(uri)
mlflow.create_experiment('test', artifact_location='sftp://192.XXX:_path_to_mlruns_folder_')
experiment=mlflow.get_experiment_by_name('test')
with mlflow.start_run(experiment_id=experiment.experiment_id, run_name=run_name) as run:
mlflow.log_param(_parameter_name_, _parameter_value_)
mlflow.log_artifact(_an_artifact_, _artifact_folder_name_)
I can see the metrics in the UI and the artifacts in the correct destination folder on the remote machine. However, in the UI I receive this message when trying to see the artifacts:
Unable to list artifacts stored under sftp://192.XXX:path_to_mlruns_folder/run_id/artifacts for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory.
I cannot figure out why as the mlruns
folder has drwxrwxrwx
permissions and all the subfolders have drwxrwxr-x
. What am I missing?
UPDATE
Looking at it with fresh eyes, it seems weird that it tries to list files through sftp://192.XXX:
, it should just look in the folder _path_to_mlruns_folder_/_run_id_/artifacts
. However, I still do not know how to circumvent that.
Upvotes: 3
Views: 4148
Reputation: 2351
The problem seems to be that by default the systemd service is run by root. Specifying a user and creating a ssh key pair for that user to access the same remote machine worked.
[Unit]
Description=MLflow server
After=network.target
[Service]
Restart=on-failure
RestartSec=20
User=_user_
Group=_group_
ExecStart=/bin/bash -c 'PATH=_yourpath_/anaconda3/envs/mlflow_server/bin/:$PATH exec mlflow server --backend-store-uri postgresql://mlflow:mlflow@localhost/mlflow --default-artifact-root sftp://[email protected]:_yourotherpath_/MLFLOW_SERVER/mlruns -h 0.0.0.0 -p 8000'
[Install]
WantedBy=multi-user.target
_user_
and _group_
should be the same listed by ls -la
in the mlruns
directory.
Upvotes: 2