Riley Hun
Riley Hun

Reputation: 2775

How Do You "Permanently" Delete An Experiment In Mlflow?

Permanent deletion of an experiment isn't documented anywhere. I'm using Mlflow w/ backend postgres db

Here's what I've run:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)

This deletes the the experiment, but when I run a new experiment with the same name as the one I just deleted, it will return this error:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.

I cannot find anywhere in the documentation that shows how to permanently delete everything.

Upvotes: 30

Views: 36043

Answers (9)

abhishek gopinath
abhishek gopinath

Reputation: 11

Adding to @dominik-franek,

DELETE FROM datasets WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted');

This is also required. Im using MLFlow 2.14.x

Upvotes: 0

Hajar Razip
Hajar Razip

Reputation: 593

Just leaving this here for the mssql folks who might need this:

DELETE FROM mlflow.dbo.experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    );
DELETE FROM mlflow.dbo.latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.metrics WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.tags WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.params WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs where experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
));
DELETE FROM mlflow.dbo.runs WHERE experiment_id in (
    SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
);
DELETE FROM mlflow.dbo.experiments where lifecycle_stage='deleted';

Upvotes: 0

wedesoft
wedesoft

Reputation: 2989

If you use SQlite, you can permanently delete experiment 42 as follows:

mlflow gc --backend-store-uri sqlite:////path/to/mlflow.db --experiment-ids 42

Also see mlflow gc documentation.

Upvotes: 1

JacoSolari
JacoSolari

Reputation: 1394

If you are using S3 as backend store for artifacts and have a EC2 server for tracking, this is my workaround to delete full experiments 'folders'. You have to delete the experiment both on the artifact store (S3) and on the backend store (database hosted on EC2)

Permanently delete experiments via list of experiment ids:

def permanently_delete_experiments_on_mlflow(list_of_experiments_id: list):
    mlflow_client = MlflowClient(tracking_uri=YOUR_EC2_TRACKING_URI)
    commands = []
    for experiment_id in list_of_experiments_id:
        print(f'deleting experiment {experiment_id}')
        os.system(f"aws s3 rm {YOUR_S3_ARTIFACTS_STORE} "
                  f"--recursive --exclude '*' --include '{experiment_id}/*'")
        try:
            mlflow_client.delete_experiment(experiment_id)
        except Exception as e:
            print_red(f'failed to execute mlflow_client.delete_experiment({experiment_id}) \n {str(e)}')
        commands.append(f"YOUR_PATH_TO_DATABASE_ON_EC2{os.sep}database.db{os.sep}{experiment_id} ")
        commands.append(f"YOUR_PATH_TO_DATABASE_ON_EC2{os.sep}database.db{os.sep}.trash{os.sep}{experiment_id} ")
    # format commands to send via ssh to EC2
    commands = f"ssh -i {YOUR_EC2_SSH_KEY_PATH} ubuntu@{YOUR_EC2_IP} rm -r " \
               + ' '.join(commands)
    print('executing on EC2 the following command: \n   ', commands)
    result = subprocess.Popen(commands, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
    response, err = result.communicate()
    print('response:', response)

Note that for this to work you need to have AWS CLI installed.

It basically runs a shell command from Python that does the trick. As a side note, the mlflow tracking with EC2 creates 'folders' both on the EC2 database and on S3 named according to the experiment id which contains a 'subfolder' for each run id corresponding to that experiment. The code above relies on this structure.

Upvotes: 1

Dominik Franek
Dominik Franek

Reputation: 351

I am adding SQL commands if you want to delete permanently Trash of MLFlow if you are using PostgreSQL as backend storage.

Change to your MLFlow Database, e.g. by using: \c mlflow and then:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

The difference is, that I added the 'params' Table SQL Delete command there.

Upvotes: 14

Yash Nag
Yash Nag

Reputation: 1245

Extending @Lee Netherton's answer, you can use PyMySQL to execute those queries and remove all metadata from MLFlow tracking server after deleting the experiment from the MLFlow tracking client.

import pymysql

def perm_delete_exp():
    connection = pymysql.connect(
        host='localhost',
        user='user',
        password='password',
        db='mlflow',
        cursorclass=pymysql.cursors.DictCursor)
    with connection.cursor() as cursor:
        queries = """
            USE mlflow;
            DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM experiments where lifecycle_stage="deleted";
        """
        for query in queries.splitlines()[1:-1]:
            cursor.execute(query.strip())
    connection.commit()
    connection.close()

You can (perhaps should) execute the entire query at once, but I found debugging it easier this way.

Upvotes: 4

MaGarb
MaGarb

Reputation: 43

Unfortunately, the SQL commands above did not work with SQLITE in my case. Here is the SQL version working with sqlite in a database IDE by replacing the "any" commands with "in":

DELETE FROM experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    );
DELETE FROM latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid in (
    SELECT run_uuid FROM runs where experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

Upvotes: 1

Moore
Moore

Reputation: 623

As of mlflow 1.11.0, the recommended way to permanently delete runs within an experiment is: mlflow gc [OPTIONS].

From the documentation, mlflow gc will

Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.

Upvotes: 24

Lee Netherton
Lee Netherton

Reputation: 22482

Unfortunately it seems there is no way to do this via the UI or CLI at the moment :-/

The way to do it depends on the type of backend file store that you are using.

Filestore:

If you are using the filesystem as a storage mechanism (the default) then it is easy. The 'deleted' experiments are moved to a .trash folder. You just need to clear that out:

rm -rf mlruns/.trash/*

As of the current version of the documentation (1.7.2), they remark:

It is recommended to use a cron job or an alternate workflow mechanism to clear .trash folder.

SQL Database:

This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";

Upvotes: 30

Related Questions