Reputation: 761
I need to load N (around 50) tables from a Source DB to a Target one. Each table is different from the others (so different metadata); I thought I could use a parent pkg to call the childs where each one will have simple structure such as a DFT to map the tables to import. The idea 1 child pkg -> 1 table -> 1 dft + a config table used by the parent to call the N pkg would be a nice design a guess as in that way, I could reach modularibility and I could add/remove the tables easiliy. The bad apporah would be to put all in one monolitic package with a bouch od DFT... blah Do you have any idea/example to address such a scenario? Mario
Upvotes: 0
Views: 7421
Reputation: 8909
there is a lot of factors which have impact on the what scenario to choose.
But in general:
For small tables with relatively few rows, you can put multiple sources/destinations in single data flow
If you have complex ETL, for the source/destination, than it is better to put them to separate data flow tasks for clarity
If you need to define the sequence of execution you have to use multiple data flow tasks, as you cannot control the order of execution for multiple sources/destinations in single data flow tasks.
Whenever you need different transactional isolation level or behavior, you have to put them into separate data flows.
Whenever you are unsure on the impact of the ETL on the source system put them in separate data flows as it will allow you to optimize the execution order in the future more easily.
If you have large tables than put them into separate data flow tasks, as this will allow to optimize buffer sizes for different tables and optimize the ETL process for whatever reason
So from the above if you have relatively small tables, and straight source/destination mapping, than there is no problem to have more source/destinations in single data flow.
In other cases it is better or necessary to put them into separate data flows as it will allow you optimize the ETL process from all three points of view:
Load impact on the Source systems
Load impact on the destination systems
Utilization of the machine on which the ETL process is running (CPU consumption, memory consumption and overall though output).
Upvotes: 1
Reputation: 1335
You could use an Execute SQL Task in SSIS to copy using this or execute directly from SQL Agent. This assumes that the schema already exists on the destination db.
DECLARE @tblname as varchar(50);
DECLARE @tblschema as varchar(50);
DECLARE @srcdb as varchar(12);
DECLARE @destdb as varchar(12);
DECLARE @tablesql as nvarchar(255);
SET @srcdb = 'MYSRCDB';
SET @destdb = 'MYDESTDB';
DECLARE tbl_cursor CURSOR FOR
SELECT table_schema, table_name FROM information_schema.tables where table_type = 'BASE TABLE'
OPEN tbl_cursor
FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
WHILE @@FETCH_STATUS = 0
BEGIN
SET @tablesql = N'SELECT * into '+@destdb+'.'+@tblschema+'.'+@tblname+' FROM '+@srcdb+'.'+@tblschema+'.'+@tblname;
EXECUTE sp_executesql @tablesql
FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
END
CLOSE tbl_cursor
DEALLOCATE tbl_cursor
Upvotes: 2
Reputation: 61269
I've become a fan of using biml to solve these sorts of problems.
Assuming everything is lined up, you'd end up with 29 packages that have a single data flow task in them pulling from source to destination (based on an SSIS Variable).
<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.IO" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!--
<#
string connectionStringSource = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;Provider=SQLNCLI11.1";
string connectionStringDestination = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012_DEST;Integrated Security=SSPI;Provider=SQLNCLI11.1";
string SrcTableQuery = @"
SELECT
SCHEMA_NAME(t.schema_id) AS schemaName
, T.name AS tableName
FROM
sys.tables AS T
WHERE
T.is_ms_shipped = 0
AND T.name <> 'sysdiagrams';
";
DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
#>
-->
<Connections>
<OleDbConnection
Name="SRC"
CreateInProject="false"
ConnectionString="<#=connectionStringSource#>"
RetainSameConnection="false">
</OleDbConnection>
<OleDbConnection
Name="DST"
CreateInProject="false"
ConnectionString="<#=connectionStringDestination#>"
RetainSameConnection="false">
</OleDbConnection>
</Connections>
<Packages>
<# foreach (DataRow dr in dt.Rows) { #>
<Package ConstraintMode="Linear"
Name="<#=dr[1].ToString()#>"
>
<Variables>
<Variable Name="SchemaName" DataType="String"><#=dr[0].ToString()#></Variable>
<Variable Name="TableName" DataType="String"><#=dr[1].ToString()#></Variable>
<Variable Name="QualifiedTableSchema"
DataType="String"
EvaluateAsExpression="true">"[" + @[User::SchemaName] + "].[" + @[User::TableName] + "]"</Variable>
</Variables>
<Tasks>
<Dataflow
Name="DFT"
>
<Transformations>
<OleDbSource
Name="OLE_SRC <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="SRC"
>
<TableFromVariableInput VariableName="User.QualifiedTableSchema"/>
</OleDbSource>
<OleDbDestination
Name="OLE_DST <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="DST"
KeepIdentity="true"
TableLock="true"
UseFastLoadIfAvailable="true"
KeepNulls="true"
>
<TableFromVariableOutput VariableName="User.QualifiedTableSchema" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
At this point, you simply need to figure out how you want to coordinate the execution of the packages.
Upvotes: 4