Reputation: 1573
I have a transactions table in a flat file
like
ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
and target
transaction table will have
ItemID,CustomerID,Qty,Price,TotalValue
Now I have to import it into the transactions table using SSIS package
But before importing ItemID
and CustomerID
I should look into the lookup
tables ItemMaster
and CustomerMaster
, if not there, then I have insert new tuples into the tables and take the new itemID
or customerID
and import the transaction to the transactions table. It can be done using lookup transformations in SSIS.
Or is it better to import transactions into a temporary table using a SSIS package ,update new ItemIDs
and customer IDs
in the temporary table and then insert transactions from the temp table to the main transactions table
Which option will be better from the performance wise ?
Upvotes: 2
Views: 3654
Reputation: 12281
There are several ways of doing it .
1.Using Staging Table
2.Using Lookup
3.Transforming the stored procedure logic in SSIS
1.Using Staging Table
Dump all the flat file data into a staging table .Lets name it as StgTransaction
.Create a procedure to perform the tasks .
Merge ItemMaster target
using StgTransaction src
on target.ItemID = src.ItemID
WHEN NOT MATCHED THEN
INSERT (ItemName)
values (src.ItemID);
Merge CustomerMaster target
using Trans src
on target.CustomerID = Src.CustomerID
WHEN NOT MATCHED THEN
INSERT (CustomerName)
values (src.CustomerID);
with cte(ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue) as
(
Select I.ItemID,I.ItemName,
C.CustomerID,C.CustomerName,
f.Qty,f.price,f.TotalValue
from ItemMaster I inner join Trans f
on I.ItemName = f.ItemName
inner join CustomerMaster c
on c.CustomerName = f.CustomerName
)
Insert into Transactions
Select ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
from cte
Basically I'm inserting all the missing values into the 2 master tables using Merge Syntax .Instead of Merge you can use NOT EXISTS
Insert into ItemMaster
Select ItemName from stgTransaction s
where not exists
(Select 1 from ItemMaster im
where im.ItemName = s.ItemName
);
Once the missing
values are inserted then just join the staging table with the 2 master
tables and insert it into target
.
Wrap the above query into a procedure
and call the procedure after the Data Flow Task
(Which loads the Data from flat file
to staging table
)
2.Using Lookup
The package design will look like
You should go with this approach if you are not allowed to create staging table in your database . This will be slow because of blocking components (Union ALL) and OlEDB command(problem with RBAR (row by agonizing row) issue)
Steps :-
1.Use lookup
with ItemMaster
table
2.Create a ItemID
column (name it as NewItemID) using Derived transformation
which will store the new ItemID generated from ItemMaster table when the data is loaded .join Lookup with Derived Transformation using No Match Output
3.The No Matched values should be inserted into ItemMaster table.For this lets create a procedure which inserts the data and retrieves the ItemID value as an Output
ALTER PROCEDURE usp_InsertMaster
@ItemName AS varchar(20),
@id AS INT OUTPUT AS
INSERT INTO ItemMaster
(ItemName)
VALUES
(@ItemName)
SET @id = SCOPE_IDENTITY()
//If your using ID as Identity value else use Output clause to retrieve the ID
3.Call this procedure in OLEDB command and map the output with the column created in Derived transformation
OLEDB command
using Union ALL
to combine the rows from matched
and No Matched values
and then again follow the same procedure with the CustomerMaster
table3.Last option is Transforming procedure
logic in SSIS
Package Design is
1.Load the data into staging
2.Use Merge
or Not Exists
and load the missing values in 2 Master tables using Execute SQL Task
3.Use Data Flow Task with source as Staging and 2 lookups with the master tables .Since all the missing values are already inserted into Master tables ,so there wont be any Lookup No match Output
. Just connect the Lookup Match output with Oledb Destination (Transaction Table)
IMHO i think the 1st
approach will be fast
. The problem arises only because there are 2 master tables which needs to be updated along with that get the inserted ID's and load it into target table.So doing it synchronously
is difficult .
Upvotes: 1