Reputation: 115
I have an SSIS Transformation Task that I use as my final destination task to insert data into a SQL Server table. The reason I use the transformation task and not an SQL Server Destination task is because I do not know beforehand what the columns will in the table that we will be inserting into.
In a for each loop container, I am looking for access DB's (in 97 format). The rest of the control flow basically creates a new SQL database and also a table. The access files are what we call "minute" databases where they contain minute information gathered by another process. I need to create a new SQL DB named after the 'minute' db and a table called 'MINUTE' with the columns created based on certain info from the access db. For each of our clients, based on the number of parameters they have at their site, determines the number of columns I need to create in the SQL Minute table.
In the data flow I have two key components: The OLE DB source component (Source - Minute Table) and the Script Transformation task (Destination - Minute Table).
The "Source - Minute Table" gets the data from the access database. The "Destination - Minute Table" transforms the data and inserts it into the appropriate DB and table.
Everything works as it should. I tested it on a DB with 491,000+ records and it took 1 minute. However, I'm testing with one of our larger customers that has over 50 parameters and the access database contains 2+ million records. The package flies until I reach around 477,000 records, and then it pretty much comes to a halt. I can wait 10 minutes, and even longer, until the record count updates, and then continue to wait again.
I've done much research and followed all of the recommendations and guidelines that I have found. My datasource is not sorted. I use SQL command instead of Table, etc in the OLE DB Source. I've changed the values of DefaultBufferMaxRows and DefaultBufferSize many times and get the same results.
Code:
Public Class ScriptMain
Inherits UserComponent
Private conn As SqlConnection
Private cmd As SqlCommand
Private DBName As SqlParameter
Private columnsForInsert As SqlParameter
Private tableValues As SqlParameter
Private numberOfParams As Integer
Private db As String
Private folderPath As String
Private dbConn As String
Private folder As String
Private columnParamIndex As Integer
Private columnDate As DateTime
Private columnMinValue As Double
Private columnStatus As String
Private columnCnt1 As Int16
Private dateAdded As Boolean = False
Private columnStatusCnt As String
Private columnsConstructed As Boolean = False
Private buildValues As StringBuilder
Private columnValues As StringBuilder
Private i As Integer = 0
'This method is called once, before rows begin to be processed in the data flow.
'
'You can remove this method if you don't need to do anything here.
Public Overrides Sub PreExecute()
MyBase.PreExecute()
Try
'Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
'conn = New SqlConnection(dbConnection)
'conn.Open()
'cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}
'columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
'cmd.Parameters.Add(columnsForInsert)
'DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
'cmd.Parameters.Add(DBName)
'tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
'cmd.Parameters.Add(tableValues)
db = Variables.varMinFileName.ToString
folder = Variables.varMinFolderName.ToString
folderPath = folder & "\" & db & ".mdb"
dbConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & folderPath
Using SourceDataAdapter As OleDbDataAdapter = New OleDbDataAdapter("SELECT DISTINCT PARAM_INDEX FROM [MINUTE];", dbConn)
Dim SourceDatatable As New DataTable
SourceDataAdapter.Fill(SourceDatatable)
numberOfParams = SourceDatatable.Rows.Count
End Using
'columnValues.Append("dtmTime, ")
buildValues = New StringBuilder
columnValues = New StringBuilder
columnValues.Append("dtmTime, ")
Catch ex As Exception
Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)
writer.WriteLine(ex.Message)
writer.Close()
writer.Dispose()
Finally
End Try
End Sub
' This method is called after all the rows have passed through this component.
'
' You can delete this method if you don't need to do anything here.
Public Overrides Sub PostExecute()
MyBase.PostExecute()
'
' Add your code here
'
buildValues = Nothing
columnValues = Nothing
End Sub
Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
While Buffer.NextRow()
Input0_ProcessInputRow(Buffer)
End While
End Sub
'This method is called once for every row that passes through the component from Input0.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn100
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo
Dim result As Object
Dim rtnValue As String = Variables.varMinFileName.Replace("_", "")
Dim colName As String
Try
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
colName = column.Name.ToString
If Not colName.Contains("NULL") Then
'If Not columnValue Is Nothing Then
Select Case column.Name.ToString
Case "PARAM_INDEX"
'result = columnValue.GetValue(Row, Nothing)
result = Row.PARAMINDEX
columnParamIndex = CType(result, Byte)
If columnsConstructed = False And i <= numberOfParams - 1 Then
columnValues.Append(String.Format("VALUE_{0}, STATUS_{0}, ", columnParamIndex.ToString))
End If
Exit Select
Case "dtmTIME"
'result = columnValue.GetValue(Row, Nothing)
result = Row.dtmTIME
columnDate = CType(result, DateTime)
If dateAdded = False Then ' only need to add once since rows are vertical
buildValues.Append("'" & columnDate & "', ")
dateAdded = True
End If
Exit Select
Case "MIN_VALUE"
'result = columnValue.GetValue(Row, Nothing)
result = Row.MINVALUE
columnMinValue = CType(result, Double)
buildValues.Append(columnMinValue & ", ")
Exit Select
Case "MIN_STATUS"
'result = columnValue.GetValue(Row, Nothing)
result = Row.MINSTATUS
columnStatus = CType(result, String)
Exit Select
Case "MIN_CNT_1"
'result = columnValue.GetValue(Row, Nothing)
result = Row.MINCNT1
columnCnt1 = CType(result, Byte)
columnStatusCnt = columnStatus & "010" & columnCnt1.ToString.PadLeft(5, "0"c) & "-----"
buildValues.Append("'" & columnStatusCnt & "', ")
Case Else
Exit Select
End Select
'End If
End If
Next
If i = numberOfParams - 1 Then
If columnsConstructed = False Then
columnValues.Remove(columnValues.Length - 2, 1)
End If
buildValues.Remove(buildValues.Length - 2, 1)
Dim valueResult As String = buildValues.ToString()
SetStoredProc()
cmd.Parameters("@Columns").Value = columnValues.ToString
cmd.Parameters("@DBName").Value = "[" & rtnValue & "].[dbo].[MINUTE]"
cmd.Parameters("@Values").Value = valueResult
cmd.ExecuteNonQuery()
buildValues.Clear()
columnsConstructed = True
dateAdded = False
columnParamIndex = 0
columnMinValue = 0
columnStatus = String.Empty
columnCnt1 = 0
i = 0
conn.Close()
conn.Dispose()
Else
i += 1
End If
Catch ex As Exception
Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)
writer.WriteLine(ex.Message)
writer.Close()
writer.Dispose()
Finally
'buildValues = Nothing
'columnValues = Nothing
End Try
End Sub
Private Sub SetStoredProc()
Try
Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
conn = New SqlConnection(dbConnection)
conn.Open()
cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}
columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
cmd.Parameters.Add(columnsForInsert)
DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
cmd.Parameters.Add(DBName)
tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
cmd.Parameters.Add(tableValues)
Catch ex As Exception
Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)
writer.WriteLine(ex.Message)
writer.Close()
writer.Dispose()
End Try
End Sub
End Class
Since I can't upload images yet here I've included a blog link I created that includes ample screen shots to help understand the problem mentioned here: SSIS slows down during transformation task
Any help in determining why my package slows after 400k records and doesn't process all 2+ million records in a reasonable time is much appreciated!
Thanks, Jimmy
Upvotes: 2
Views: 1692
Reputation: 115
Full solution can be viewed here on my blog with screenshots - SSIS slowdown solved
In order to get around SSIS slowing down when a large number of records are being transformed and inserted into SQL Server as my destination, I redesigned my SSIS package. Instead of doing an insert in a data transformation task for every record that comes through the buffer, I’ve eliminated it and have used a stored procedure to do a bulk insert. In order to accomplish this, I read in the data from each access DB into a table called “MINUTE” in my SQL Server instance. This minute table has the same schema as the access DB’s and I let SSIS do the heavy lifting of importing all the data into this table. After the data is imported, I execute my stored procedure which transforms the data in this minute table (horizontal records) and does a bulk insert into my new destination MINUTE SQL table (one vertical record.)
The stored procedure that does the bulk insert and transforms the data looks like this:
PROCEDURE [dbo].[InsertMinuteBulk]
-- Add the parameters for the stored procedure here
(@Columns varchar(MAX), @DBName varchar(4000))
AS
BEGIN
DECLARE @SQL varchar(MAX)
SET @SQL =’;WITH Base AS (
SELECT dtmTime,
param_index,
CONVERT(nvarchar(16), MIN_VALUE) AS [VALUE_],
CONVERT(nvarchar(3), MIN_STATUS) + ”000” + LEFT(replicate(”0”,5) + CONVERT(nvarchar(5), MIN_CNT_1),5) + ”—–” AS [STATUS_]
FROM [DataConversion].[dbo].[MINUTE]
)
,norm AS (
SELECT dtmTime, ColName + CONVERT(varchar, param_index) AS ColName, ColValue
FROM Base
UNPIVOT (ColValue FOR ColName IN ([VALUE_], [STATUS_])) AS pvt
)
INSERT INTO ‘ + @DBName + ‘
SELECT *
FROM norm
PIVOT (MIN(ColValue) FOR ColName IN (‘+@Columns+’)) AS pvt’
EXEC (@SQL);
In the Data Flow task, the “Minute Data Source" is an ADO.NET Data Source and feeds the data into my SQL Server destination – "Minute Data Destination".
In the Control Flow, the final task of "Bulk Insert Minute Data" executes the Bulk Insert stored procedure.
The package now runs uninterrupted and is pretty fast considering the size of data that I’m reading, transforming and inserting.
I’ve ran the package as an SSIS job and it took 38 minutes to complete converting 7 months (or 7 minute access DB’s) worth of minute data with over 2 million rows in each access DB.
Upvotes: 2
Reputation: 1618
This probably isn't terribly helpful but my guess is you are running out of memory. If SSIS has to page you've had it in my experience.
Can you batch up the work somehow in several smaller runs perhaps?
Upvotes: 2