Brian Vallelunga
Brian Vallelunga

Reputation: 10201

UPDATE FROM in Azure SQL DW?

I'm receiving an error in Azure SQL DW trying to do an UPDATE FROM query. The error is "FROM clause in UPDATE and DELETE statements cannot contain subquery sources or joins"

Is this just specific to SQL DW? I don't see anything wrong with this query otherwise. If it is a limitation of SQL DW, what's the alternative?

-- Permanent fact table with 5 billion rows
CREATE TABLE FactTable (Id1 INT, Id2 INT, EmailAddress NVARCHAR(100), Value1 INT)
WITH (DISTRIBUTION = HASH(EmailAddress));

-- Staging fact table with 10 million rows    
CREATE TABLE StageTable (Id1 INT, Id2 INT, EmailAddress NVARCHAR(100), Value1 INT)
WITH (DISTRIBUTION = HASH(EmailAddress), HEAP);

-- Add a secondary index that should help with joining to StageTable
CREATE NONCLUSTERED INDEX ix ON FactTable (Id1, Id2);

UPDATE fact
SET
   Value1 = CASE WHEN stage.Value1 > fact.Value1 THEN stage.Value1 ELSE fact.Value1 END
FROM FactTable AS fact
INNER JOIN StageTable AS stage ON fact.Id1 = stage.Id1 AND fact.Id2 = stage.Id2

Upvotes: 5

Views: 13438

Answers (3)

wBob
wBob

Reputation: 14399

As per the documentation Azure SQL Data Warehouse does support UPDATE but does not support ANSI joins in the FROM clause. You can use CTAS to work around. Simple two-table update:

UPDATE dbo.FactTable
SET
   Value1 = CASE WHEN stage.Value1 > dbo.FactTable.Value1 THEN stage.Value1 ELSE dbo.FactTable.Value1 END
FROM dbo.StageTable AS stage
WHERE dbo.FactTable.Id1 = stage.Id1 
  AND dbo.FactTable.Id2 = stage.Id2;

More complex example with CTAS, copied wholesale from the main UPDATE documentation page:

-- Create an interim table
CREATE TABLE CTAS_acs
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT  ISNULL(CAST([EnglishProductCategoryName] AS NVARCHAR(50)),0)    AS [EnglishProductCategoryName]
,       ISNULL(CAST([CalendarYear] AS SMALLINT),0)                      AS [CalendarYear]
,       ISNULL(CAST(SUM([SalesAmount]) AS MONEY),0)                     AS [TotalSalesAmount]
FROM    [dbo].[FactInternetSales]       AS s
JOIN    [dbo].[DimDate]                 AS d    ON s.[OrderDateKey]             = d.[DateKey]
JOIN    [dbo].[DimProduct]              AS p    ON s.[ProductKey]               = p.[ProductKey]
JOIN    [dbo].[DimProductSubCategory]   AS u    ON p.[ProductSubcategoryKey]    = u.[ProductSubcategoryKey]
JOIN    [dbo].[DimProductCategory]      AS c    ON u.[ProductCategoryKey]       = c.[ProductCategoryKey]
WHERE   [CalendarYear] = 2004
GROUP BY
        [EnglishProductCategoryName]
,       [CalendarYear]
;

-- Use an implicit join to perform the update
UPDATE  AnnualCategorySales
SET     AnnualCategorySales.TotalSalesAmount = CTAS_ACS.TotalSalesAmount
FROM    CTAS_acs
WHERE   CTAS_acs.[EnglishProductCategoryName] = AnnualCategorySales.[EnglishProductCategoryName]
AND     CTAS_acs.[CalendarYear]               = AnnualCategorySales.[CalendarYear]
;

--Drop the interim table
DROP TABLE CTAS_acs
;

Upvotes: 4

Dan Bracuk
Dan Bracuk

Reputation: 20804

Simplifying what you attempted would work. Just take away the joins and update one table from the other.

update FactTable
set this = that
from StageTable s where s.something = FactTable.something

Whether or not this is the best approach depends on your circumstances, but it will execute without throwing an error.

Upvotes: 1

Ron Dunn
Ron Dunn

Reputation: 3078

I find it a good practice with ASDW (and APS/PDW) to avoid bulk updates like the plague.

Here's a pure CTAS alternative that will be faster in cases where you're updating a significant percentage of rows.

It assumes that id1 is a relatively good distribution key, and that the number of staging rows is less than the fact rows, making a replication feasible. This strategy should eliminate data movement between nodes.

If you had a very large staging table, creating a surrogate column in each table that was a combination of id1 and id2, then distributing both tables by the hash of that column, would give even better performance.

create  table FactTable (
        id1 int,
        id2 int,
        value1 int)
with    (distribution = hash(id1));

create  table StageTable (
        id1 int,
        id2 int,
        value1 int)
with    (distribution = replicate);

create  table UpdatedFact 
with    (distribution = hash(id1)) 
as
select  f.id1,
        f.id2,
        case when s.id1 is not null and s.value1 > f.value1 
            then s.value1
            else f.value1
            end as value1
from    FactTable f
        left outer join StageTable s
        on s.id1 = f.id1
        and s.id2 = f.id2

truncate table FactTable;
alter table UpdatedFact switch to FactTable;
drop table UpdatedFact;

Upvotes: 1

Related Questions