Reputation: 450
I think I am facing a common problem when moving data between two servers but I can't find a good solution. I have two servers: server1
and server2
. I want to move a subset of a table in server1
to a table in server2
. I have the following tables (source and sink tables have the same schema):
server1.dbo.source
- Contains ID + other columns.server2.dbo.sink
- Contains ID + other columns (same schema as source
table).server2.dbo.utilizedIds
- Contains one column with list of IDs I would like to move.If I could reference both servers in the same query, the data that I am trying to move from server1
to server2
could be described by this SQL statement:
SELECT * FROM server1.dbo.source
WHERE server1.dbo.source.id IN (SELECT * FROM server2.dbo.utilizedIds)
What I am trying to achieve is to only move IDs
that I need. These IDs
are dynamic and stored in a table in my destination server. I am doing this because I need a really small subset of the source table (I need thousands of rows out of billions). So far, these are the solutions that I have found and the reasons why I can't use them or I wouldn't like to use them:
server2
and perform the filtering there - Moving this much data every time I need to update my sink is not feasible.server2
and then filter this table - While this would be more performant than option 1, I really don't want to incur the cost of storing billions of rows when I only need thousands.server1
and insert the IDs
into this table, then filter using standard SQL - I am not the owner of server1
, I cannot create tables or stored procedures. The temp table option was promising but temp tables don't persist across activities.IDs
as a string to the SELECT
statement. Something like this: SELECT * FROM server1.dbo.source WHERE id IN ('ID1', 'ID2', 'ID3' ...)
- So far, this is my best solution. However, SQL statements have a length and parameter limit, and this is not a good practiceTo me, this seems like a common use case. What is the best practice or correct solution to this problem? Building my query using strings feels like a hack and not something ADF was designed for.
Upvotes: 0
Views: 410
Reputation: 5074
In data flow
you can use the Join
transformation to get the matching rows of server 1 data (dbo.source) with the IDs of server 2 (dbo.utilizedIds).
Example:
Input:
Server1 – dbo.source
Server2- dbo.utilizedIds
Data flow:
source
transformation in data flow and connect it to the source dataset (server1 dbo.source)source
transformation and connect it to server2 Id’s table (server2 dbo.utilizedIds)Join
transformation and perform an inner join between source1 (server1 dbo.source) and source2 (server2 dbo.utilizedIds) with join condition as Id.Join
transformation will pull all the columns from the left and right streams for the matched rows. To remove duplicate columns, use select transformation and remove the columns.sink
transformation and connect it to the sink table (server2 dbo.sink) to load the source rows matched Ids to the sink.Upvotes: 1