Reputation: 10201
I'm receiving an error in Azure SQL DW trying to do an UPDATE FROM query. The error is "FROM clause in UPDATE and DELETE statements cannot contain subquery sources or joins"
Is this just specific to SQL DW? I don't see anything wrong with this query otherwise. If it is a limitation of SQL DW, what's the alternative?
-- Permanent fact table with 5 billion rows
CREATE TABLE FactTable (Id1 INT, Id2 INT, EmailAddress NVARCHAR(100), Value1 INT)
WITH (DISTRIBUTION = HASH(EmailAddress));
-- Staging fact table with 10 million rows
CREATE TABLE StageTable (Id1 INT, Id2 INT, EmailAddress NVARCHAR(100), Value1 INT)
WITH (DISTRIBUTION = HASH(EmailAddress), HEAP);
-- Add a secondary index that should help with joining to StageTable
CREATE NONCLUSTERED INDEX ix ON FactTable (Id1, Id2);
UPDATE fact
SET
Value1 = CASE WHEN stage.Value1 > fact.Value1 THEN stage.Value1 ELSE fact.Value1 END
FROM FactTable AS fact
INNER JOIN StageTable AS stage ON fact.Id1 = stage.Id1 AND fact.Id2 = stage.Id2
Upvotes: 5
Views: 13438
Reputation: 14399
As per the documentation Azure SQL Data Warehouse does support UPDATE
but does not support ANSI joins in the FROM
clause. You can use CTAS to work around. Simple two-table update:
UPDATE dbo.FactTable
SET
Value1 = CASE WHEN stage.Value1 > dbo.FactTable.Value1 THEN stage.Value1 ELSE dbo.FactTable.Value1 END
FROM dbo.StageTable AS stage
WHERE dbo.FactTable.Id1 = stage.Id1
AND dbo.FactTable.Id2 = stage.Id2;
More complex example with CTAS, copied wholesale from the main UPDATE documentation page:
-- Create an interim table
CREATE TABLE CTAS_acs
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT ISNULL(CAST([EnglishProductCategoryName] AS NVARCHAR(50)),0) AS [EnglishProductCategoryName]
, ISNULL(CAST([CalendarYear] AS SMALLINT),0) AS [CalendarYear]
, ISNULL(CAST(SUM([SalesAmount]) AS MONEY),0) AS [TotalSalesAmount]
FROM [dbo].[FactInternetSales] AS s
JOIN [dbo].[DimDate] AS d ON s.[OrderDateKey] = d.[DateKey]
JOIN [dbo].[DimProduct] AS p ON s.[ProductKey] = p.[ProductKey]
JOIN [dbo].[DimProductSubCategory] AS u ON p.[ProductSubcategoryKey] = u.[ProductSubcategoryKey]
JOIN [dbo].[DimProductCategory] AS c ON u.[ProductCategoryKey] = c.[ProductCategoryKey]
WHERE [CalendarYear] = 2004
GROUP BY
[EnglishProductCategoryName]
, [CalendarYear]
;
-- Use an implicit join to perform the update
UPDATE AnnualCategorySales
SET AnnualCategorySales.TotalSalesAmount = CTAS_ACS.TotalSalesAmount
FROM CTAS_acs
WHERE CTAS_acs.[EnglishProductCategoryName] = AnnualCategorySales.[EnglishProductCategoryName]
AND CTAS_acs.[CalendarYear] = AnnualCategorySales.[CalendarYear]
;
--Drop the interim table
DROP TABLE CTAS_acs
;
Upvotes: 4
Reputation: 20804
Simplifying what you attempted would work. Just take away the joins and update one table from the other.
update FactTable
set this = that
from StageTable s where s.something = FactTable.something
Whether or not this is the best approach depends on your circumstances, but it will execute without throwing an error.
Upvotes: 1
Reputation: 3078
I find it a good practice with ASDW (and APS/PDW) to avoid bulk updates like the plague.
Here's a pure CTAS alternative that will be faster in cases where you're updating a significant percentage of rows.
It assumes that id1 is a relatively good distribution key, and that the number of staging rows is less than the fact rows, making a replication feasible. This strategy should eliminate data movement between nodes.
If you had a very large staging table, creating a surrogate column in each table that was a combination of id1 and id2, then distributing both tables by the hash of that column, would give even better performance.
create table FactTable (
id1 int,
id2 int,
value1 int)
with (distribution = hash(id1));
create table StageTable (
id1 int,
id2 int,
value1 int)
with (distribution = replicate);
create table UpdatedFact
with (distribution = hash(id1))
as
select f.id1,
f.id2,
case when s.id1 is not null and s.value1 > f.value1
then s.value1
else f.value1
end as value1
from FactTable f
left outer join StageTable s
on s.id1 = f.id1
and s.id2 = f.id2
truncate table FactTable;
alter table UpdatedFact switch to FactTable;
drop table UpdatedFact;
Upvotes: 1