Reputation: 75
I'm trying to migrate data from Database 1 to Database 2. In Database 1, I have TableA
with two rows: Product_ID
and Product_Num
. TableB
in Database 2 holds exactly the same information except the TableB.Product_ID
is a smaller subset of TableA.Product_ID
and TableB.Product_Num
are all NULL
. I need to insert the Product_Num
from TableA
into Product_Num
in TableB
only for those Product_IDs
that exist in TableB
. I can migrate the data using a simple SQL statement, however, for this project, everything must be done using SSIS.
I've set up a Data Flow Task linking OLE DB Source (TableA
) to OLE DB Destination (TableB
), but that's about it. I wrote an SQL command (SELECT A.Product_Num FROM Database1.DBO.TableA A JOIN Database2.DBO.TableB B ON A.Product_ID = B.Product_ID
) in OLE DB Source and had it mapped to the Product_Num
column in OLE DB Destination, but the JOIN
is causing some products. Any idea what I can do? Thanks.
Upvotes: 0
Views: 5455
Reputation: 61201
I need to populate the Product_Num for all of data in Database2 from the matching table in Database1.
Establish an OLE DB connection to Database1
and Database2
To make your current query work, you'll need to get a linked server established between Database1
and Database2
.
The other option is to use a query against Database2
to generate the list of all the Product_IDs needing numbers.
SELECT A.Product_ID FROM dbo.TableA AS A WHERE A.Product_Num IS NULL;
This assumes you don't get a linked server set up. You used the term Join in your question and there is a Merge Join
but I don't think you'll need it for this scenario. Instead, you'll be interested in the Lookup Component
. Add one of those after you OLE DB Source and configure it. Use the Database1
Connection Manager (as it contains the desired reference data). Depending on whether this is a 2005 or 2008+ installation, and whether the option of not having a match exists, then you will need to deal with non-matched entities. The default is that the lookup must find a match for every row that passes through it. Generally, I change this behaviour to be Redirect no match output
for 2008+ instances or "ignore failed lookups" (approximate) for 2005.
Leave it in Full Cache (unless the box is starved for memory or the source table is excessively large)
The Query you will want to write is
SELECT B.Product_Num, B.Product_ID FROM dbo.TableB AS B;
Connect the [Product_ID] from the OLE DB Source (left side) to the [Product_ID] in the Lookup (right side) and check the Product_Num.
At this point, you have two columns in your data flow, the Product_ID from B and the corresponding Product_Num from A. The challenge with SSIS is that the updates don't really exist. SSIS is designed to be insert heavy and it behaves like a champ at that. Updates however, are by default singleton operations. Update row, update row, update row... Contrast that with the desired set based approach for Inserts (boom 100000 rows added, boom another N rows added)
The out of the box approach is to use the OLE DB Command object and write your update statement there. For 10, 100, 1000 updates, that's probably nothing. Updating a million row? That approach isn't going to scale.
The approach for large volume updates is to use SSIS for what it does really well (INSERT) against a staging table and then perform a post data flow step of a set-based UPDATE (Execute SQL Task) against the target table from the staging table. This assumes you have the ability to create a table on the target server. Some jobs I've been at this was not an option.
UPDATE B
SET
Product_Num = S.Product_Num
FROM
Stage.TableC S
INNER JOIN
dbo.TableB B
ON B.Product_ID = S.Product_ID
WHERE
B.Product_Num IS NULL;
Upvotes: 2