Reputation: 65

Deleting duplicate records with multiple column with multiple conditions

I have Table1 follows:

----------------------------------
| Id   |     Value  |     Date   |
----------------------------------
| 1    |      xxx   | 05/01/2015 |
| 2    |      xxx   | 05/02/2015 |
| 3    |      yyy   | 06/01/2015 |
| 4    |      yyy   | 06/01/2015 |
----------------------------------

Delete the duplicate row using latest date, if the date is equal delete the duplicate row using latest ID. (In other words, keep the latest date and latest id, delete old date and id)

No programming, only query. The table is one of the join table in a multiple join query.

Should be compatible with Vertica.

Upvotes: 0

Answers (3)

marcothesane

Reputation: 6721

If you want to join this table with something else, you might also just want to just have the rows you need without having to delete stuff before joining.

Vertica offers the analytic limit clause , which could come in handy here.

Here's how it would work with your input data:

WITH
input(Id,Value,Date) AS (
          SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
)
SELECT
 *
FROM input
LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC);
-- out  Id | Value |    Date    
-- out ----+-------+------------
-- out   2 | xxx   | 2015-05-02
-- out   3 | yyy   | 2015-06-01
-- out (2 rows)
-- out 
-- out Time: First fetch (2 rows): 14.240 ms. All rows formatted: 14.276 ms

Would this help ...?

Well, if you really need to delete, you can also use the above in a NOT IN predicate to run your delete ... like I'm doing here:

-- creating a temp table to delete from  ....
CREATE LOCAL TEMPORARY TABLE t1 (Id,Value,Date) 
ON COMMIT PRESERVE ROWS AS (   
          SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
);
-- delete as announced ..
DELETE FROM t1 WHERE id NOT IN (
  SELECT
    id
  FROM t1
  LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC)
);
-- check the content now ...
SELECT * FROM t1;
-- out CREATE TABLE
-- out Time: First fetch (0 rows): 16.081 ms. All rows formatted: 
-- 16.110 ms
-- out  OUTPUT 
-- out --------
-- out       2
-- out (1 row)
-- out 
-- out Time: First fetch (1 row): 61.740 ms. All rows formatted:
--    61.788 ms
-- out  Id | Value |    Date    
-- out ----+-------+------------
-- out   2 | xxx   | 2015-05-02
-- out   3 | yyy   | 2015-06-01
-- out (2 rows)
-- out Time: First fetch (2 rows): 6.761 ms. 
-- All rows formatted: 6.814 ms

Upvotes: 1

Gordon Linoff

Reputation: 1269663

I think Vertica will support this:

delete table1
where table1.id not in (select t2.id
                        from (select t2.*
                                     row_number() over (partition by t2.value order by t2.date, t2.id desc) as seqnum
                              from table1 t2
                             )
                         where seqnum = 1
                        );

Upvotes: 1

Sarfaraz

Reputation: 146

The following statement deletes duplicate rows and keeps the highest id:

DELETE t1 FROM table1 t1
    INNER JOIN
    table1 t2 
WHERE
    t1.id < t2.id AND t1. Date = t2. Date;

may be helpful for you and you can modify according to your need

Upvotes: 2

Deleting duplicate records with multiple column with multiple conditions

Answers (3)

Related Questions