Egor Antonov
Egor Antonov

Reputation: 91

SQL Server import faster than bulk import

I have a large (~50Gb, ~300 mln rows) tab separated file, which I want to import into a SQL Server table with columns: char(10), varchar(512), nvarchar(512), nvarchar(512).

It takes about a day to bulk import it using T-SQL, SSIS or C# SqlBulkCopy class.

Is there any faster way to load this data?

Or could there be some condition slowing it down, which I can remove or change?

Upvotes: 1

Views: 4221

Answers (3)

Patrick
Patrick

Reputation: 1

Have you tried to load the data in parallel? Here is a simple SSIS solution:

1) Install a SSIS row number generator (http://www.konesans.com/products/row-number-transformation.aspx) 2) Create a SSIS package to read the file. 3) In the data flow task, read source file. 4) In the data flow task, insert the row number generator. 5) In the data flow task, insert a conditional split using row number and creating multiple threads (ie. rownumber % 3 == 0, rownumber % 3 == 1, rownumber % 3 == 2). 6) In the data flow task, insert target destination table.

Hope this helps.

Upvotes: 0

Pete Carter
Pete Carter

Reputation: 2731

I have just spent the last few weeks fighting with the optimizing a very large load myself. BULK INSERT is the fastest way, I found with BCP, as opposed to SSIS or TSQL Bulk Insert, but there are things you can do to tune this.

  • Try raising/lowering the rows per batch setting, to move resource pressure between CPU and memory (higher will reduce CPU, lower will decrease memory).
  • If there is a Clustered Index, or non-clustered indexes drop them and re-create them after the insert
  • Split the TSV into smaller files using a .NET tool and then parallel load them into the table. This requires the table is a Heap (Clustered index dropped)
  • Make sure it is minimally logged. For a Heap, this requires TABLOCK, for a Clustered Index it requires Trace flag 610 and requires the data robe ordered the same as the clustered index key. Either will require either SIMPLE or BULK LOGGED recovery models

Upvotes: 2

Alf47
Alf47

Reputation: 571

If you are inserting to an existing table, drop all indexes prior to import and re-create them after the import.

If you are using SSIS, you can tweak the batch and commit sizes.

Verify there is adequate memory on the server for such a large data load.

Perform the loading operation on the local server (copy file locally, don't load over the network).

Configure your destination database and transaction log auto-growth options to a reasonable value, such as a few hundred MB chunks at a time (default is typically growth by 1MB for the master data file .mdf). Growth operations are slow/expensive so you want to minimize these.

Make sure your data and log files are on fast disks, preferably on separate LUNs. Ideally you want your log file on a mirrored separate LUN from your log file (you may need to talk to your storage admin or hosting provider for options).

Upvotes: 2

Related Questions