jay_t
jay_t

Reputation: 3793

delete all records except the id I have in a python list

I want to delete all records in a mysql db except the record id's I have in a list. The length of that list can vary and could easily contain 2000+ id's, ...

Currently I convert my list to a string so it fits in something like this: cursor.execute("""delete from table where id not in (%s)""",(list)) Which doesn't feel right and I have no idea how long list is allowed to be, ....

What's the most efficient way of doing this from python?

Altering the structure of table with an extra field to mark/unmark records for deletion would be great but not an option. Having a dedicated table storing the id's would indeed be helpful then this can just be done through a sql query... but I would really like to avoid these options if possible.

Thanks,

Upvotes: 3

Views: 4046

Answers (4)

ceteras
ceteras

Reputation: 3378

I'd add a "todelete tinyint(1) not null default 1" column to the table, update it to 0 for those id's which have to be kept, then delete from table where todelete;. It's faster than not in.

Or, create a table with the same structure as yours, insert the kept rows there and rename tables. Then, drop the old one.

Upvotes: 0

msw
msw

Reputation: 43487

That's what temporary tables are for. You create a temporary table containing your exclusion list and use the DBM to do your selection for you. A simple example:

CREATE TABLE words (id integer primary key not null, word string);
CREATE TEMPORARY TABLE exclusion (word string);
INSERT INTO words VALUES ... # 100,000 of these
INSERT INTO exclusion VALUES ... # 1000 of these
DELETE FROM words WHERE words.word NOT IN (SELECT word FROM exclusion);
# 99,000 records now in words, table exclusion evaporates when the session is over

Someone who actually knows SQL can probably improve on my last line. If you are doing selections in application space, something is wrong. MySQL has temporary tables, but even if you didn't a CREATE/DROP exclusions would still be better than an overlong statement.

Incidentally, I hacked this up in Python only because I had a huge wordlist handy. The code is boring so not posted.

Upvotes: -2

unutbu
unutbu

Reputation: 879251

If the db table is not too large, just read in all the ids, and make a list of the ones you want to delete:

keep_ids=[...]
cursor.execute('SELECT id FROM table')
delete_ids=[]
for (row_id,) in cursor:
    if row_id not in keep_ids:
        delete_ids.append(row_id)
cursor.executemany('DELETE FROM table WHERE id = %s',delete_ids)

If the db table is huge, then recreate the table:

keep_ids=[...]
cursor.execute('CREATE TABLE IF NOT EXISTS temp_table LIKE table')
cursor.executemany('INSERT INTO temp_table (SELECT * FROM table WHERE id = %s)',keep_ids)
cursor.execute('DROP TABLE table')
cursor.execute('ALTER TABLE temp_table RENAME table')

Upvotes: 4

apg
apg

Reputation: 2649

Have you exhausted the possibility of doing your computation in SQL directly? If so, I don't see another way to do this without doing what you are already doing. Be sure, of course, that you are creating valid SQL, which if you're plugging in:

','.join(str(int(x)) for x in ids)

you certainly are, if substituted in your statement directly. I'm not sure if there's a limit to the number of ids in the NOT IN (...) clause but would doubt it, since you can use an arbitrarily long list when using a subquery to populate that list.

Upvotes: 1

Related Questions