Reputation: 65
I have Table1 follows:
----------------------------------
| Id | Value | Date |
----------------------------------
| 1 | xxx | 05/01/2015 |
| 2 | xxx | 05/02/2015 |
| 3 | yyy | 06/01/2015 |
| 4 | yyy | 06/01/2015 |
----------------------------------
Delete the duplicate row using latest date, if the date is equal delete the duplicate row using latest ID. (In other words, keep the latest date and latest id, delete old date and id)
No programming, only query. The table is one of the join table in a multiple join query.
Should be compatible with Vertica.
Upvotes: 0
Views: 1744
Reputation: 6721
If you want to join this table with something else, you might also just want to just have the rows you need without having to delete stuff before joining.
Vertica offers the analytic limit clause , which could come in handy here.
Here's how it would work with your input data:
WITH
input(Id,Value,Date) AS (
SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
)
SELECT
*
FROM input
LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC);
-- out Id | Value | Date
-- out ----+-------+------------
-- out 2 | xxx | 2015-05-02
-- out 3 | yyy | 2015-06-01
-- out (2 rows)
-- out
-- out Time: First fetch (2 rows): 14.240 ms. All rows formatted: 14.276 ms
Would this help ...?
Well, if you really need to delete, you can also use the above in a NOT IN predicate to run your delete ... like I'm doing here:
-- creating a temp table to delete from ....
CREATE LOCAL TEMPORARY TABLE t1 (Id,Value,Date)
ON COMMIT PRESERVE ROWS AS (
SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
);
-- delete as announced ..
DELETE FROM t1 WHERE id NOT IN (
SELECT
id
FROM t1
LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC)
);
-- check the content now ...
SELECT * FROM t1;
-- out CREATE TABLE
-- out Time: First fetch (0 rows): 16.081 ms. All rows formatted:
-- 16.110 ms
-- out OUTPUT
-- out --------
-- out 2
-- out (1 row)
-- out
-- out Time: First fetch (1 row): 61.740 ms. All rows formatted:
-- 61.788 ms
-- out Id | Value | Date
-- out ----+-------+------------
-- out 2 | xxx | 2015-05-02
-- out 3 | yyy | 2015-06-01
-- out (2 rows)
-- out Time: First fetch (2 rows): 6.761 ms.
-- All rows formatted: 6.814 ms
Upvotes: 1
Reputation: 1269663
I think Vertica will support this:
delete table1
where table1.id not in (select t2.id
from (select t2.*
row_number() over (partition by t2.value order by t2.date, t2.id desc) as seqnum
from table1 t2
)
where seqnum = 1
);
Upvotes: 1
Reputation: 146
The following statement deletes duplicate rows and keeps the highest id:
DELETE t1 FROM table1 t1
INNER JOIN
table1 t2
WHERE
t1.id < t2.id AND t1. Date = t2. Date;
may be helpful for you and you can modify according to your need
Upvotes: 2