Saad
Saad

Reputation: 1360

SQL: How to delete rows from table based on a criteria

I have the following table:

tbl
source    type    date
---       ---     ---
google    A       2010-02-25
google    A       2013-04-11
facebook  C       2008-10-22
facebook  C       2007-01-28

I want to keep only a single entry of each source, and the criteria is select the source tuple with min(date) group by source. The table consists of millions of records, and I'm looking for an efficient way to delete redundant records.

Upvotes: 1

Views: 5983

Answers (4)

Gaetano Piazzolla
Gaetano Piazzolla

Reputation: 1567

Add an identity column to the duplicate table as a serial number that acts as a row unique identifier(auto incremental ascending order):

 alter table tbl add sno int identity(1,1)

table

This query selects only non duplicated rows with min(date):

(select min(date),sno From tbl group by source) 

So "sno" will be equals to "1" and "4".

Now join with this table, and delete the records of the join that are duplicated (t.sno is null)

delete E from tbl E
    left join
    (select min(date),sno From tbl group by source) T on E.sno=T.sno
where T.sno is null

table3

Solution adapted from method 3 of this link: LINK

Upvotes: 0

Nayan Sharma
Nayan Sharma

Reputation: 1853

delete from t where date not in (select al.d from (select min(date) as d  from t group by source )al);

Upvotes: 0

Madhukar
Madhukar

Reputation: 1242

In Microsoft SQL, you can try this.

;
WITH cte 
        AS (SELECT ROW_NUMBER() OVER (PARTITION BY source, type
                                        ORDER BY createdate) RN
            FROM   tbsource)
DELETE FROM cte
WHERE  RN > 1;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271111

In MySQL, you can do this using a join:

delete t
    from t join
         (select source, min(date) as mindate
          from t
          group by source
         ) tt
         on t.source = tt.source
    where t.date > tt.mindate;

The only way -- off-hand -- that I can think to make this more efficient is to store the aggregation result in a subquery and add an index to it.

I can also add that regardless of the computation for determining the rows to delete, deleting lots of rows in a table in inefficient. Usually, I would recommend a three-step approach:

  1. Write a query to generate the table you want and store the results in a temporary table.
  2. Truncate the original table.
  3. Re-insert the (much) smaller number of rows.

Upvotes: 3

Related Questions