outdoorsman33
outdoorsman33

Reputation: 115

SSIS Script Transformation slows down after 400k records

I have an SSIS Transformation Task that I use as my final destination task to insert data into a SQL Server table. The reason I use the transformation task and not an SQL Server Destination task is because I do not know beforehand what the columns will in the table that we will be inserting into.

In a for each loop container, I am looking for access DB's (in 97 format). The rest of the control flow basically creates a new SQL database and also a table. The access files are what we call "minute" databases where they contain minute information gathered by another process. I need to create a new SQL DB named after the 'minute' db and a table called 'MINUTE' with the columns created based on certain info from the access db. For each of our clients, based on the number of parameters they have at their site, determines the number of columns I need to create in the SQL Minute table.

In the data flow I have two key components: The OLE DB source component (Source - Minute Table) and the Script Transformation task (Destination - Minute Table).

The "Source - Minute Table" gets the data from the access database. The "Destination - Minute Table" transforms the data and inserts it into the appropriate DB and table.

Everything works as it should. I tested it on a DB with 491,000+ records and it took 1 minute. However, I'm testing with one of our larger customers that has over 50 parameters and the access database contains 2+ million records. The package flies until I reach around 477,000 records, and then it pretty much comes to a halt. I can wait 10 minutes, and even longer, until the record count updates, and then continue to wait again.

I've done much research and followed all of the recommendations and guidelines that I have found. My datasource is not sorted. I use SQL command instead of Table, etc in the OLE DB Source. I've changed the values of DefaultBufferMaxRows and DefaultBufferSize many times and get the same results.

Code:

Public Class ScriptMain
Inherits UserComponent

Private conn As SqlConnection
Private cmd As SqlCommand
Private DBName As SqlParameter
Private columnsForInsert As SqlParameter
Private tableValues As SqlParameter
Private numberOfParams As Integer
Private db As String
Private folderPath As String
Private dbConn As String
Private folder As String
Private columnParamIndex As Integer
Private columnDate As DateTime
Private columnMinValue As Double
Private columnStatus As String
Private columnCnt1 As Int16
Private dateAdded As Boolean = False
Private columnStatusCnt As String
Private columnsConstructed As Boolean = False
Private buildValues As StringBuilder
Private columnValues As StringBuilder
Private i As Integer = 0

'This method is called once, before rows begin to be processed in the data flow.
'
'You can remove this method if you don't need to do anything here.
Public Overrides Sub PreExecute()
    MyBase.PreExecute()

    Try
        'Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
        'conn = New SqlConnection(dbConnection)
        'conn.Open()
        'cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}

        'columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(columnsForInsert)

        'DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(DBName)

        'tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(tableValues)

        db = Variables.varMinFileName.ToString
        folder = Variables.varMinFolderName.ToString
        folderPath = folder & "\" & db & ".mdb"
        dbConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & folderPath

        Using SourceDataAdapter As OleDbDataAdapter = New OleDbDataAdapter("SELECT DISTINCT PARAM_INDEX FROM [MINUTE];", dbConn)
            Dim SourceDatatable As New DataTable

            SourceDataAdapter.Fill(SourceDatatable)

            numberOfParams = SourceDatatable.Rows.Count
        End Using

        'columnValues.Append("dtmTime, ")
        buildValues = New StringBuilder
        columnValues = New StringBuilder

        columnValues.Append("dtmTime, ")

    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    Finally

    End Try
End Sub

' This method is called after all the rows have passed through this component.
'
' You can delete this method if you don't need to do anything here.
Public Overrides Sub PostExecute()
    MyBase.PostExecute()
    '
    ' Add your code here
    '
    buildValues = Nothing
    columnValues = Nothing
End Sub

Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
    While Buffer.NextRow()
        Input0_ProcessInputRow(Buffer)
    End While
End Sub

'This method is called once for every row that passes through the component from Input0.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim column As IDTSInputColumn100
    Dim rowType As Type = Row.GetType()
    Dim columnValue As PropertyInfo
    Dim result As Object
    Dim rtnValue As String = Variables.varMinFileName.Replace("_", "")
    Dim colName As String

    Try
        For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
            columnValue = rowType.GetProperty(column.Name)

            colName = column.Name.ToString

            If Not colName.Contains("NULL") Then
                'If Not columnValue Is Nothing Then
                Select Case column.Name.ToString
                    Case "PARAM_INDEX"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.PARAMINDEX
                        columnParamIndex = CType(result, Byte)
                        If columnsConstructed = False And i <= numberOfParams - 1 Then
                            columnValues.Append(String.Format("VALUE_{0}, STATUS_{0}, ", columnParamIndex.ToString))
                        End If
                        Exit Select
                    Case "dtmTIME"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.dtmTIME
                        columnDate = CType(result, DateTime)
                        If dateAdded = False Then ' only need to add once since rows are vertical
                            buildValues.Append("'" & columnDate & "', ")
                            dateAdded = True
                        End If
                        Exit Select
                    Case "MIN_VALUE"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINVALUE
                        columnMinValue = CType(result, Double)
                        buildValues.Append(columnMinValue & ", ")
                        Exit Select
                    Case "MIN_STATUS"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINSTATUS
                        columnStatus = CType(result, String)
                        Exit Select
                    Case "MIN_CNT_1"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINCNT1
                        columnCnt1 = CType(result, Byte)
                        columnStatusCnt = columnStatus & "010" & columnCnt1.ToString.PadLeft(5, "0"c) & "-----"
                        buildValues.Append("'" & columnStatusCnt & "', ")
                    Case Else
                        Exit Select
                End Select
                'End If
            End If
        Next

        If i = numberOfParams - 1 Then
            If columnsConstructed = False Then
                columnValues.Remove(columnValues.Length - 2, 1)
            End If

            buildValues.Remove(buildValues.Length - 2, 1)

            Dim valueResult As String = buildValues.ToString()

            SetStoredProc()

            cmd.Parameters("@Columns").Value = columnValues.ToString
            cmd.Parameters("@DBName").Value = "[" & rtnValue & "].[dbo].[MINUTE]"
            cmd.Parameters("@Values").Value = valueResult
            cmd.ExecuteNonQuery()

            buildValues.Clear()

            columnsConstructed = True
            dateAdded = False
            columnParamIndex = 0
            columnMinValue = 0
            columnStatus = String.Empty
            columnCnt1 = 0

            i = 0
            conn.Close()
            conn.Dispose()
        Else
            i += 1
        End If
    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    Finally
        'buildValues = Nothing
        'columnValues = Nothing
    End Try
End Sub

Private Sub SetStoredProc()
    Try
        Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
        conn = New SqlConnection(dbConnection)
        conn.Open()
        cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}

        columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(columnsForInsert)

        DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(DBName)

        tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(tableValues)
    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    End Try
End Sub
End Class

Since I can't upload images yet here I've included a blog link I created that includes ample screen shots to help understand the problem mentioned here: SSIS slows down during transformation task

Any help in determining why my package slows after 400k records and doesn't process all 2+ million records in a reasonable time is much appreciated!

Thanks, Jimmy

Upvotes: 2

Views: 1692

Answers (2)

outdoorsman33
outdoorsman33

Reputation: 115

Full solution can be viewed here on my blog with screenshots - SSIS slowdown solved

In order to get around SSIS slowing down when a large number of records are being transformed and inserted into SQL Server as my destination, I redesigned my SSIS package. Instead of doing an insert in a data transformation task for every record that comes through the buffer, I’ve eliminated it and have used a stored procedure to do a bulk insert. In order to accomplish this, I read in the data from each access DB into a table called “MINUTE” in my SQL Server instance. This minute table has the same schema as the access DB’s and I let SSIS do the heavy lifting of importing all the data into this table. After the data is imported, I execute my stored procedure which transforms the data in this minute table (horizontal records) and does a bulk insert into my new destination MINUTE SQL table (one vertical record.)

The stored procedure that does the bulk insert and transforms the data looks like this:

PROCEDURE [dbo].[InsertMinuteBulk]
 -- Add the parameters for the stored procedure here
 (@Columns varchar(MAX), @DBName varchar(4000))
 AS
 BEGIN
 DECLARE @SQL varchar(MAX)

SET @SQL =’;WITH Base AS (
 SELECT dtmTime,
 param_index,
 CONVERT(nvarchar(16), MIN_VALUE) AS [VALUE_],
 CONVERT(nvarchar(3), MIN_STATUS) + ”000” + LEFT(replicate(”0”,5) + CONVERT(nvarchar(5), MIN_CNT_1),5) + ”—–” AS [STATUS_]
 FROM [DataConversion].[dbo].[MINUTE]
 )
 ,norm AS (
 SELECT dtmTime, ColName + CONVERT(varchar, param_index) AS ColName, ColValue
 FROM Base
 UNPIVOT (ColValue FOR ColName IN ([VALUE_], [STATUS_])) AS pvt
 )
 INSERT INTO ‘ + @DBName + ‘
SELECT *
 FROM norm
 PIVOT (MIN(ColValue) FOR ColName IN (‘+@Columns+’)) AS pvt’

EXEC (@SQL);

In the Data Flow task, the “Minute Data Source" is an ADO.NET Data Source and feeds the data into my SQL Server destination – "Minute Data Destination".

In the Control Flow, the final task of "Bulk Insert Minute Data" executes the Bulk Insert stored procedure.

The package now runs uninterrupted and is pretty fast considering the size of data that I’m reading, transforming and inserting.

I’ve ran the package as an SSIS job and it took 38 minutes to complete converting 7 months (or 7 minute access DB’s) worth of minute data with over 2 million rows in each access DB.

Upvotes: 2

SinisterPenguin
SinisterPenguin

Reputation: 1618

This probably isn't terribly helpful but my guess is you are running out of memory. If SSIS has to page you've had it in my experience.

Can you batch up the work somehow in several smaller runs perhaps?

Upvotes: 2

Related Questions