Reputation: 17418
The following simplified mock-up works fine to bulk/set based insert the denormalised data in #BulkData
(improvement suggestions welcome):
IF OBJECT_ID('tempdb..#Things') IS NOT NULL
DROP TABLE #Things
IF OBJECT_ID('tempdb..#Categories') IS NOT NULL
DROP TABLE #Categories
IF OBJECT_ID('tempdb..#ThingsToCategories') IS NOT NULL
DROP TABLE #ThingsToCategories
IF OBJECT_ID('tempdb..#BulkData') IS NOT NULL
DROP TABLE #BulkData
CREATE TABLE #Things
(
ThingId INT IDENTITY(1,1) PRIMARY KEY,
ThingName NVARCHAR(255)
)
CREATE TABLE #Categories
(
CategoryId INT IDENTITY(1,1) PRIMARY KEY,
CategoryName NVARCHAR(255)
)
CREATE TABLE #ThingsToCategories
(
ThingId INT,
CategoryId INT
)
CREATE TABLE #BulkData
(
ThingName NVARCHAR(255),
CategoryName NVARCHAR(255)
)
-- the following would be done from a flat file via a bulk import
INSERT INTO #BulkData
SELECT N'Thing1', N'Category1'
UNION
SELECT N'Thing2', N'Category1'
UNION
SELECT N'Thing3', N'Category2'
INSERT INTO #Categories
SELECT DISTINCT CategoryName
FROM #BulkData
WHERE CategoryName NOT IN (SELECT DISTINCT CategoryName
FROM #Categories)
INSERT INTO #Things
SELECT DISTINCT ThingName
FROM #BulkData
WHERE ThingName NOT IN (SELECT DISTINCT ThingName FROM #Things)
INSERT INTO #ThingsToCategories
SELECT ThingId, CategoryId
FROM #BulkData
INNER JOIN #Things ON #BulkData.ThingName = #Things.ThingName
INNER JOIN #Categories ON #BulkData.CategoryName = #Categories.CategoryName
SELECT * FROM #Categories
SELECT * FROM #Things
SELECT * FROM #ThingsToCategories
One issue I have with the above, is that data in #Things
would be accessible before data are inserted into #ThingsToCategories
.
Can I wrap the above in a transaction (?) to only make #Things available when the whole bulk import has finished?
Like so:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
Does this work with a couple of million records though?
I guess one could also lower the logging level?
Upvotes: 0
Views: 146
Reputation: 16146
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
The answer is Yes. From the Documentation on Transactions:
A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.
Transactions have the following four standard properties, usually referred to by the acronym ACID. Quoting from the following link on tutorialspoint.com on SQL Transactions:
Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.
Consistency: ensures that the database properly changes states upon a successfully committed transaction.
Isolation: enables transactions to operate independently of and transparent to each other.
Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.
Again, Yes. The amount of entries is irrelevant. In my own words this time:
Atomicity: If a transaction succeeds all the operations within the transaction will take effect as soon as the transaction completes, i.e. at the time the transaction is committed. If at least one of the operations in the transaction fails, all operations are rolled back (in other words, none take hold). The amount of operations within the transaction is irrelevant.
Isolation: Other transactions won't see operations of other transactions, unless they are committed.
There are however different Transaction Isolation Levels. The default for SQL Server is READ COMMITTED
:
Specifies that statements cannot read data that has been modified but not committed by other transactions. [...]
This is a trade-off level for balance between performance and consistency. Ideally you would want everything SERIALIZABLE
(see documentation, too long to copy/paste). This isolation level trades performance(-) for consistency(+). In a lot of cases the READ COMMITTED
isolation level is good enough, but you should be aware of how it works and put that against how your transaction is supposed to work vis-à-vis completions of other transactions.
Note also that a transaction will put locks on database objects (rows, table, schema...) and that other transactions will block if they want to read or modify those objects (depending on the type of lock). For that reason it is preferable to keep the amount of operations within a transaction low. Sometimes though, transactions just do a lot of things and they can't be broken up.
Upvotes: 1