Reputation: 1397
The agreed pattern to insert new data into an already existing table in Azure Data Warehouse seems to be...
create table dbo.MyTable_New
with (distribution = round_robin)
as
select
Col1
,Col2
from dbo.MyTable
union all
select
Col1
,Col2
from dbo.MyNewStuff;
Now, what we are seeing is that on really large tables this will degrade over time, and it's obvious why you are reading everything you have already, and re-inserting it.. this seems sub-optimal to me...
Does anyone have any alternative approaches that they have seen work for them, i am thinking of things like partition switching for example...
Upvotes: 1
Views: 766
Reputation: 14379
Agreed with who? CTAS can be an effective option, but it really depends on your data volumes, SLAs, business processes etc. If you are inserting small volumes into an already large table for example, I'm not sure CTAS makes sense, why not INSERT
? You can use partition switching in Azure SQL Data Warehouse (eg here). Other options include Azure Data Factory, SSIS (which now supports Azure SQL Data Warehouse), Polybase, plain INSERT
etc
Please review the following article which offers a number of strategies for loading data into Azure SQL Data Warehouse:
Upvotes: 2