Reputation: 34
I'm facing the following problem: One of the colleages from the data science department set up a ADF pipeline which should copy data from our production database into our development database. To avoid errors with related data and writing permissions, the data from the production environment is stored in a newly created schema 'stg' within the development database. For sizing reasons, only the top 1000 entries of each table will be copied. The data should be stored at the end in the 'dbo' schema with all constraints and foreign keys. The copying from the tables in the stg schema to the dbo schema is solved with a stored procedure which only exists in the development database. So far the pipeline works. But when it comes to the copying from one schema to the other, I get errors with the foreign key relationships. This is, because it's not sure that the first 1000 entries of table A are linked to the first 1000 entries of table B. It could be, that entry nr. X of table A is linked to entry 1000 + Y in table B. My question: Is there any simple solution to solve this within the ADF pipeline or do I have to write some SQL code to fetch all related data from all related tables within our pipeline?
Here is the picture of our pipeline:
Pipeline: First search for all tables, then loop through all entries and fetch first 1000 entries and store them into the destination database
Here is our Lookup code
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE table_type = 'BASE TABLE' and TABLE_SCHEMA ='dbo' and TABLE_CATALOG ='[database name]'
And with this we extract the first 1000 lines from the tables given by the lookup (within the copy activity):
@concat('select top 1000 * from ', ' dbo.',item().TABLE_NAME)
So far we tried to enlarge the result set in the SELECT statement or adding some sorting algorithms, but both tries ended in the same error with the missing foreign keys.
Upvotes: 1
Views: 256
Reputation: 7156
There is no direct way to copy only related data from all tables in ADF. To do this, Below is the apprach
Sample lookup table
Order | Table | Related Table | Lookup_Query | Copy_Source_Query |
---|---|---|---|---|
1 | child_table1 | parent_table1 | SELECT DISTINCT fk_childtable1 FROM parent_table1 | SELECT * FROM department WHERE dept_id IN (@{variables('arrayvar1')}) |
2 | child_table2 | child_table1 | SELECT DISTINCT fk_childtable2 FROM child_table1 | SELECT * FROM staff WHERE dept_id IN (@{variables('arrayvar1')}) |
Upvotes: 2