johanpm
johanpm

Reputation: 13

Best way to write SQL delete statement, deleting pairs of records

I have a MySQL database with just 1 table: Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.

Indexes exist on both btcaddress and txid.

Data in it looks like this: enter image description here

I need to delete all "deletable" record pairs. An example is given in red. Conditions are:

In a table of 36M records, about 33M records will be deleted.

I've used this:

delete t1 
from registration t1 
inner join registration t2 
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;

It works but takes 5 hours.

Maybe this would work too (not tested yet):

delete t1 
from registration as t1, registration as t2 
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;

Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?

Database can be offline for this delete query.

Upvotes: 1

Views: 323

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270513

Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:

create table temp_registration as
    <query for the rows to keep here>;

truncate table registration;

insert into registration
    select *
    from temp_registration;

Your logic is a bit hard to follow, but I think the logic on the rows to keep is:

select r.*
from registration r
where not exists (select 1
                  from registration r2
                  where r2.txid = r.txid and
                        r2.vinvoutnr = r.vinvoutnr and
                        r2.vin <> r.vin
                 );

For best performance, you want an index on registration(txid, vinvoutnr, vin).

Upvotes: 1

Omini
Omini

Reputation: 9

Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.

Upvotes: 0

Related Questions