user2263025
user2263025

Reputation: 77

CTE with DELETE - Alternative for SQL Data Warehouse

I would like to delete all rows in a table where the batchId (a running number) older than the previous two. I could probably do this in a SQL Database with the query:

WITH CTE AS(
    SELECT
        *,
        DENSE_RANK() OVER(ORDER BY BATCHID DESC) AS RN
    FROM MyTable
)
DELETE FROM CTE WHERE RN>2

But the same is not allowed in a SQL Data Warehouse per this. Looking for alternatives here.

Upvotes: 2

Views: 3711

Answers (3)

wBob
wBob

Reputation: 14379

Azure SQL Data Warehouse only supports a limited T-SQL surface area and CTEs for DELETE operations and DELETEs with FROM clauses which will yield the following error:

Msg 100029, Level 16, State 1, Line 1
A FROM clause is currently not supported in a DELETE statement.

It does however support sub-queries so one way to write your statement like this:

DELETE dbo.MyTable
WHERE BATCHID Not In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC );

This syntax is supported in Azure SQL Data Warehouse and I have tested it. I'm not sure how efficient it will be on billions of rows though. You could also consider partition switching.

If you are deleting a large portion of your table then it might make sense to use a CTAS to put the data you want to keep into a new table, eg something like this:

-- Keep the most recent two BATCHIDS
CREATE TABLE dbo.MyTable2
WITH
(
    CLUSTERED COLUMNSTORE INDEX,
    DISTRIBUTION = HASH( BATCHID )
    -- Add partition scheme here if required
)
AS
SELECT  *
FROM dbo.MyTable
WHERE BATCHID In ( SELECT TOP 2 BATCHID FROM dbo.MyTable ORDER BY BATCHID DESC )
OPTION ( LABEL = 'CTAS : Keep top two BATCHIDs' );
GO

-- Rename or DROP old table
RENAME OBJECT dbo.MyTable TO MyTable_Old;
RENAME OBJECT dbo.MyTable2 TO MyTable;
GO

-- Optionally DROP MyTable_Old if everything has been successful
-- DROP TABLE MyTable_Old

This technique is described in more detail here.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You can try:

delete t from mytable t
    where batchId < (select max(batchid) from mytable);

Oh, if you want to keep two, perhaps this will work:

delete t from mytable t
    where batchId < (select batchid
                     from mytable
                     group by batchid
                     limit 1 offset 1
                    );

Upvotes: 0

Fahmi
Fahmi

Reputation: 37473

You can try using JOIN

delete d from MyTable d
join 
(
 SELECT
        *,
        RN = ROW_NUMBER() OVER(PARTITION BY BATCH_ID ORDER BY BATCH_ID DESC)
    FROM MyTable
)A on d.batch_id=A.batch_id where RN >2

Upvotes: 1

Related Questions