Why Bulk Import is faster than bunch of INSERTs?

I'm writing my graduate work about methods of importing data from a file to SQL Server table.

I have created my own program and now I'm comparing it with some standard methods such as

bcp
BULK INSERT
INSERT ... SELECT * FROM OPENROWSET(BULK...)
etc.

My program reads in lines from a source file, parses them,
and imports them one by one using ordinary INSERTs.

The file contains 1 million lines with 4 columns each.

And now I have the situation that my program takes 160 seconds, while the standard methods take 5-10 seconds.

Why are BULK operations faster?
Do they use special means or something?
Can you please explain it or give me some useful links or something?

Upvotes: 11

Answers (4)

Wayne Fang

Reputation: 11

For at least 20 years, SQL server (/Sybase too) DBA and other developers have used a utility (bulk copy) to quick import big data in and out of SQL server since most think it is the fastest way. Please take a look of this MS utility: https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-ver16

Upvotes: 0

DrTrunks Bell

Reputation: 56

First of all, inserting row for row is not optimal. See this article on set logic and this article on what's the fastest way to load data into SQL Server.

Second, BULK import is optimized for large loads. This has all to do with page flushing, writing to log, indexes and various other things in SQL Server. There's an technet article on how you can optimize BULK INSERTS, this sheds some light on how BULK is faster. But I cant link more than twice, so you'll have to google for "Optimizing Bulk Import Performance".

Upvotes: 0

Hybris95

Reputation: 2400

BULK INSERT can be a minimally logged operation (depending on various parameters like indexes, constraints on the tables, recovery model of the database etc). Minimally logged operations only log allocations and deallocations. In case of BULK INSERT, only extent allocations are logged instead of the actual data being inserted. This will provide much better performance than INSERT.

Compare Bulk Insert vs Insert

The actual advantage, is to reduce the amount of data being logged in the transaction log.
In case of BULK LOGGED or SIMPLE recovery model the advantage is significant.

Optimizing BULK Import Performance

You should also consider reading this answer : Insert into table select * from table vs bulk insert

By the way, there are factors that will influence the BULK INSERT performance :

Whether the table has constraints or triggers, or both.

The recovery model used by the database.

Whether the table into which data is copied is empty.

Whether the table has indexes.

Whether TABLOCK is being specified.

Whether the data is being copied from a single client or copied in parallel from multiple clients.

Whether the data is to be copied between two computers on which SQL Server is running.

Upvotes: 15

Galcoholic

Reputation: 518

I think you can find a lot of articles on it, just search for "why bulk insert is faster". For example this seems to be a good analysis:

https://www.simple-talk.com/sql/performance/comparing-multiple-rows-insert-vs-single-row-insert-with-three-data-load-methods/

Generally, any database has a lot of work for a single insert: checking the constraints, building indices, flush to disk. This complex operation can be optimized by the database when doing several in one operation, and not calling the engine one by one.

Upvotes: 0

Why Bulk Import is faster than bunch of INSERTs?

Answers (4)

Related Questions