Reputation: 58
We have number of DB table merge steps in our Azure Data Factory v2 solution. We merge tables in a single instance of Azure SQL Server DB. Source tables and target tables are in different DB schemas. Sources are defined either as a select over single table or as a join of two tables.
My doubt is which one of scenarios described bellow is better from the performance perspective.
Stored Procedure activity invokes a stored procedure that performs all the work. A Stored Procedure activity in a pipeline invokes that stored procedure. The upserts the target table with all source data. An example of such a stored procedure:
create or alter procedure dwh.fill_lnk_cemvypdet_cemstr2c_table_with_stage_data as
merge
dwh.lnk_cemvypdet_cemstr2c as target
using
(select
t.sa_hashkey cemvypdet_hashkey,
t.sa_timestamp load_date,
t.sa_source record_source,
d.sa_hashkey cemstr2c_hashkey
from
egje.cemvypdet t
join
egje.cemstr2c d
on
t.id_mstr = d.id_mstr)
as source
on target.cemvypdet_hashkey = source.cemvypdet_hashkey
and target.cemstr2c_hashkey = source.cemstr2c_hashkey
when not matched then
insert(
cemvypdet_hashkey,
cemstr2c_hashkey,
record_source,
load_date,
last_seen_date)
values(
source.cemvypdet_hashkey,
source.cemstr2c_hashkey,
source.record_source,
source.load_date,
source.load_date)
when matched then
update set last_seen_date = source.load_date;
A Copy activity declares a stored procedure to invoke in Target tab so that the activity invokes the stored procedure for every single row of the source.
create or alter procedure dwh.fill_lnk_cemvypdet_cemstr2c_subset_table_row_with_stage_data
@lnk_cemvypdet_cemstr2c_subset dwh.lnk_cemvypdet_cemstr2c_subset_type readonly
as
merge
dwh.lnk_cemvypdet_cemstr2c_subset as target
using
@lnk_cemvypdet_cemstr2c_subset
as source
on target.cemvypdet_hashkey = source.cemvypdet_hashkey
and target.cemstr2c_hashkey = source.cemstr2c_hashkey
when not matched then
insert(
hashkey,
cemvypdet_hashkey,
cemstr2c_hashkey,
record_source,
load_date,
last_seen_date)
values(
source.hashkey,
source.cemvypdet_hashkey,
source.cemstr2c_hashkey,
source.record_source,
source.load_date,
source.load_date)
when matched then
update set last_seen_date = source.load_date;
The type @lnk_cemvypdet_cemstr2c_subset is defined as a table type that follows structure of the target table.
Upvotes: 1
Views: 1749
Reputation: 15628
Scenario 1 should have better performance but taking the following optimizations in consideration:
Upvotes: 1