rio
rio

Reputation: 761

ssis best practice to load N tables from Source to Target Server

I need to load N (around 50) tables from a Source DB to a Target one. Each table is different from the others (so different metadata); I thought I could use a parent pkg to call the childs where each one will have simple structure such as a DFT to map the tables to import. The idea 1 child pkg -> 1 table -> 1 dft + a config table used by the parent to call the N pkg would be a nice design a guess as in that way, I could reach modularibility and I could add/remove the tables easiliy. The bad apporah would be to put all in one monolitic package with a bouch od DFT... blah Do you have any idea/example to address such a scenario? Mario

Upvotes: 0

Views: 7421

Answers (3)

SmartestVEGA
SmartestVEGA

Reputation: 8909

there is a lot of factors which have impact on the what scenario to choose.

But in general:

For small tables with relatively few rows, you can put multiple sources/destinations in single data flow

If you have complex ETL, for the source/destination, than it is better to put them to separate data flow tasks for clarity

If you need to define the sequence of execution you have to use multiple data flow tasks, as you cannot control the order of execution for multiple sources/destinations in single data flow tasks.

Whenever you need different transactional isolation level or behavior, you have to put them into separate data flows.

Whenever you are unsure on the impact of the ETL on the source system put them in separate data flows as it will allow you to optimize the execution order in the future more easily.

If you have large tables than put them into separate data flow tasks, as this will allow to optimize buffer sizes for different tables and optimize the ETL process for whatever reason

So from the above if you have relatively small tables, and straight source/destination mapping, than there is no problem to have more source/destinations in single data flow.

In other cases it is better or necessary to put them into separate data flows as it will allow you optimize the ETL process from all three points of view:

Load impact on the Source systems

Load impact on the destination systems

Utilization of the machine on which the ETL process is running (CPU consumption, memory consumption and overall though output).

Upvotes: 1

Steve Salowitz
Steve Salowitz

Reputation: 1335

You could use an Execute SQL Task in SSIS to copy using this or execute directly from SQL Agent. This assumes that the schema already exists on the destination db.

DECLARE @tblname as varchar(50);
DECLARE @tblschema as varchar(50);
DECLARE @srcdb as varchar(12);
DECLARE @destdb as varchar(12);
DECLARE @tablesql as nvarchar(255);

SET @srcdb = 'MYSRCDB';
SET @destdb = 'MYDESTDB';

DECLARE tbl_cursor CURSOR FOR
SELECT table_schema, table_name FROM information_schema.tables where table_type = 'BASE TABLE'

OPEN tbl_cursor

FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
WHILE @@FETCH_STATUS = 0
BEGIN

SET @tablesql = N'SELECT * into '+@destdb+'.'+@tblschema+'.'+@tblname+' FROM '+@srcdb+'.'+@tblschema+'.'+@tblname;
EXECUTE sp_executesql @tablesql

FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
END

CLOSE tbl_cursor
DEALLOCATE tbl_cursor

Upvotes: 2

billinkc
billinkc

Reputation: 61269

I've become a fan of using biml to solve these sorts of problems.

  1. Download and install BIDS Helper
  2. Add a new biml file to an existing SSIS project
  3. Disable Visual Studio's auto-fix for XML files. See Overcoming BimlScript Copy and Paste Issues in Visual Studio
  4. Paste this code into that Bimlscript.biml file
  5. Fix your source and destination connection strings (lines 8 & 9) to point to the correct servers as well as change the Provider type if SQLNCLI11.1 is not correct for your version of SQL Server
  6. Right-Click on the biml file and select "Generate SSIS Packages"

Assuming everything is lined up, you'd end up with 29 packages that have a single data flow task in them pulling from source to destination (based on an SSIS Variable).

<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.IO" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!--
<#
    string connectionStringSource = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;Provider=SQLNCLI11.1";
    string connectionStringDestination = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012_DEST;Integrated Security=SSPI;Provider=SQLNCLI11.1";

    string SrcTableQuery =     @"
SELECT
    SCHEMA_NAME(t.schema_id) AS schemaName
,   T.name AS tableName
FROM
    sys.tables AS T
WHERE
    T.is_ms_shipped = 0
    AND T.name <> 'sysdiagrams';
";

    DataTable dt = null;
    dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
#>    
-->
    <Connections>
        <OleDbConnection
            Name="SRC"
            CreateInProject="false"
            ConnectionString="<#=connectionStringSource#>"
            RetainSameConnection="false">
        </OleDbConnection>
        <OleDbConnection
            Name="DST"
            CreateInProject="false"
            ConnectionString="<#=connectionStringDestination#>"
            RetainSameConnection="false">
        </OleDbConnection>
    </Connections>

    <Packages>
        <# foreach (DataRow dr in dt.Rows) { #>
            <Package ConstraintMode="Linear"
                Name="<#=dr[1].ToString()#>"

            >
            <Variables>
                <Variable Name="SchemaName" DataType="String"><#=dr[0].ToString()#></Variable>
                <Variable Name="TableName" DataType="String"><#=dr[1].ToString()#></Variable>
                <Variable Name="QualifiedTableSchema"
                          DataType="String"
                          EvaluateAsExpression="true">"[" +  @[User::SchemaName] + "].[" +  @[User::TableName] + "]"</Variable>
            </Variables>
            <Tasks>
                <Dataflow
                    Name="DFT"
                >
                    <Transformations>
                        <OleDbSource
                            Name="OLE_SRC <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
                            ConnectionName="SRC"
                        >
                            <TableFromVariableInput VariableName="User.QualifiedTableSchema"/>
                        </OleDbSource>
                        <OleDbDestination
                            Name="OLE_DST <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
                            ConnectionName="DST"
                            KeepIdentity="true"
                            TableLock="true"
                            UseFastLoadIfAvailable="true"
                            KeepNulls="true"
                            >
                            <TableFromVariableOutput VariableName="User.QualifiedTableSchema" />                        
                        </OleDbDestination>
                    </Transformations>
                </Dataflow>

            </Tasks>
            </Package>
        <# } #>
    </Packages>
</Biml>

At this point, you simply need to figure out how you want to coordinate the execution of the packages.

Upvotes: 4

Related Questions