Reputation: 67
First, I am new to SSIS so I am still getting the hang of things.
I am using Visual Studio 19 and SSMS 19
Regardless, I have set-up an OLE DB Package from .TSV file to table in SSMS. The issue is that it took 1 hour and 11 minutes to execute for 500,000 rows.
The data is extremely variable so I have set-up a staging table in SSMS that is essentially all varchar(max) columns. Once all the data is inserted, then I was going to look at some aggregations like max(len(<column_name>)) in order to better optimize the table and the SSIS package.
Anyways, there are 10 of these files so I need to create a ForEach File loop. This would take at minimum (1.17 hours)*10=11.70 hours of total runtime.
I thought this was a bit long and created a BULK INSERT Task, but I am having some issues. It seems very straightforward to set-up.
I added the Bulk Insert Task to the Control Flow tab and went into the Bulk Insert Task Editor Dialogue Box.
From here, I configured the Source and Destination connections. Both of which went very smoothly. I only have one local instance of SQL Server on my machine so I used localhost.<database_name> and the table name for the Destination Connection.
I run the package and it executes just fine without any errors or warnings. It takes less than a minute for a roughly 600 MB .TSV file to load into a SSMS table with about 300 columns of varchar(max).
I thought this was too quick and it was. Nothing loaded, but the package executed!!!
I have tried searching for this issue with no success. I checked my connections too.
Do I need Data Flow Tasks for Bulk Insert Tasks? Do I need any connection managers? I had to configure Data Flow Tasks and connection managers for the OLE DB package, but the articles I have referenced do not do this for Bulk Insert Tasks.
What am I doing wrong?
Any advice from someone more well-versed in SSIS would be much appreciated.
Upvotes: 0
Views: 633
Reputation: 4477
Regarding my comment about using a derived column in place of a real destination, it would look like 1 in the image below. You can do this in a couple of steps:
If 1) takes a long time, it could indicate a bottleneck with slow read times on the disk where the file is or a network bottleneck if the file is on another server on a shared drive. If 2) adds a lot more time, it could indicate a memory bottleneck on the server that SSIS is running. Please note that you testing this on a server is the best way to test performance, because it removes a lot of issues that probably won't exist there such as network bottlenecks and memory constraints.
Lastly, please turn on the feature noted as 2) below, AutoAdjustBufferSize. This will change the settings for DefaultBufferSize (max memory in the buffer) and DefaultBufferMaxRows (total rows allowed in each buffer, these are the numbers that you see next to the arrows in the dataflow when you run the package interactively). Because your column sizes are so large, this will give a hint to the server to maximize the buffer size which gives you a bigger and faster pipeline to push the data through.
One final note, if you add the real destination and that has a significant impact on time, you can look into issues with the target table. Make sure there are no indexes including a cluster index, make sure tablock is on, make sure there are no constraints or triggers.
Upvotes: 1