Volker
Volker

Reputation: 507

Delete rows while keeping the one with oldest date entry using SQL (MySQL)

I have a few million database rows in MySQL InnoDB database which need to get cleaned up. Look at this example.

SELECT archiveid, clearingid, pickdate 
  FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  ORDER BY pickdate;


+-----------+------------+---------------------+
| archiveid | clearingid | pickdate            |
+-----------+------------+---------------------+
|  34328367 |   30978729 | NULL                | *
|  34333844 |   30978729 | 2015-10-27 15:55:30 | <- keep only this row with oldest date
|  34438038 |   30978729 | 2016-03-01 10:34:25 | *
|  34481472 |   30978729 | 2016-04-20 13:44:19 | *
+-----------+------------+---------------------+
4 rows in set (0.01 sec)

So I know the clearingid value(s) of the affected field(s) and want to remove the one with no pickdate (null) and the two lines which are redundant (later, after first pick). In the example above, the ones marked with * should get deleted.

Any hints about how such SQL update/delete might look like?

There are about 30M rows and about 250K rows (known clearingid's) to clean up.

With the initial idea of Matias Barrios I found this solution to verify. It perfectly lists the rows I want to delete:

SELECT archiveid, clearingid, pickdate 
  FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  AND (pickdate NOT IN (SELECT MIN(pickdate)
                        FROM tblclearingarchive 
                        WHERE clearingid = 30978729 ) 
       OR pickdate is NULL)
  ORDER BY pickdate;

+-----------+------------+---------------------+
| archiveid | clearingid | pickdate            |
+-----------+------------+---------------------+
|  34328367 |   30978729 | NULL                |
|  34438038 |   30978729 | 2016-03-01 10:34:25 |
|  34481472 |   30978729 | 2016-04-20 13:44:19 |
+-----------+------------+---------------------+
3 rows in set (0.20 sec)

But I fail to delete using this sort of query:

DELETE FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  AND (pickdate NOT IN (SELECT MIN(pickdate)
                        FROM tblclearingarchive 
                        WHERE clearingid = 30978729 ) 
      OR pickdate is NULL);

ERROR 1093 (HY000): You can't specify target table 'tblclearingarchive' for update in FROM clause

Upvotes: 2

Views: 877

Answers (3)

Matias Barrios
Matias Barrios

Reputation: 5056

This should do as you intend.

DELETE FROM tblclearingarchive
  WHERE archiveid IN (
    SELECT * FROM ( SELECT archiveid, clearingid, pickdate 
      FROM tblclearingarchive 
      WHERE clearingid = 30978729 
      AND pickdate NOT IN ( SELECT max(pickdate)
                        FROM tblclearingarchive LIMIT 1)
      ORDER BY pickdate) tmpTable
   ) OR  pickdate IS NULL

Let me know if it works.

Upvotes: 0

VBoka
VBoka

Reputation: 9083

You will need to do this:

Create "assistance table" so you can do the delete

create table test as 
select * from tblclearingarchive

Then do the delete:

delete from tblclearingarchive t1
where t1.pickdate <> (select min(t.pickdate) 
                     from test t
                     group by clearingid)
or t1.pickdate is null;

Here is a small demo

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269503

If you want the first, then you can filter:

select t.*
from t
where t.pickdate = (select min(t2.pickdate) 
                    from t t2
                    where t2.clearingid = t.clearingid
                   );

Or, if you just want one row:

select t.*
from t
where t.pickdate is not null
order by t.pickdate
limit 1;

EDIT:

If you want to actually modify the table:

delete t
    from t join
         (select clearingid, min(pickdate) as min_pickdate
          from t
          group by clearingid
         ) c
         on t.clearingid = c.clearingid
    where t.pickdate > c.min_pickdate or t.pickdate is null;

Upvotes: 0

Related Questions