Reputation: 8597
Why does running multiple SqlBulkCopy inserts, against unrelated tables, async
& in parallel, on a single Transaction seem to behave as though it's running in series instead?
I have some code that is calculating and storing a large volume of data. The calculation is done up-front, so the storage section of the code gets handed this big pile of data to be stored.
My DB writes are being done with SqlBulkCopy.WriteToServerAsync
which does the job nicely, in general.
Amongst the things I need to store are 6 tables which are business-related, but not SQL-related. As such, my write to them needs to be in a transaction, so that an error on any one write reverts the writes on all the others.
The performance of this code is fairly critical, so I want to be able to run the BulkInserts in parallel. There are no FKeys or any other tables being interacted with, (data integrity is managed by the code) so I don't see any reason that this shouldn't be possible.
I thought I knew how to write all the code and have been able to get it all working, but there's a weird performance slow-down that I don't understand:
Happy to provide actual bits of code you want, but this is already a very long Q, and the code would be pretty long to0. LMK if you do want to see anything.
I can write:
"BulkInsert into each table sequentially, all in a single Transaction".
new SqlConnection()
and .BeginTransaction()
,foreach
over the 6 tables, and await InsertToTable(transaction)
each table before the foreach
moves to the next one.foreach
concludes then I .Commit()
the transaction and close the connection."BulkInsert into each table sequentially, with a new connection & Transaction for each table."
foreach
over the 6 tables, and await InsertToTable()
each table before the foreach
moves to the next one.InsertToTable()
call I open a new SqlConnection
and BeginTransaction
, and then I .Commit()
and .Close()
before returning from the method."BulkInsert into each table in parallel, with a new connection & Transaction for each table."
thisTableTask = InsertToTable()
for each table, and capturing the Task
s but not await
ing them (yet).await Task.WhenAll()
the 6 tasks captured.InsertToTable()
call I open a new SqlConnection
and BeginTransaction
, and then I .Commit()
and .Close()
before returning from the method. (but note that the foreach has moved onto the next table, because it doesn't await
the Task immediately."BulkInsert into each table in parallel, all in a single Transaction".
new SqlConnection()
and .BeginTransaction()
.thisTableTask = InsertToTable(transaction)
for each table, and capturing the Task
s but not await
ing them (yet).await Task.WhenAll()
the 6 tasks captured.WhenAll
concludes then I .Commit()
the transaction and close the connection.In all cases the eventual BulkInsert looks like:
using (var sqlBulk = BuildSqlBulkCopy(tableName, columnNames, transactionToUse))
{
await sqlBulk.WriteToServerAsync(dataTable);
}
private SqlBulkCopy BuildSqlBulkCopy(string tableName, string[] columnNames, SqlTransaction transaction)
{
var bulkCopy = new SqlBulkCopy(transaction.Connection, SqlBulkCopyOptions.Default, transaction)
{
BatchSize = 10000,
DestinationTableName = tableName,
BulkCopyTimeout = 3600
};
foreach (var columnName in columnNames)
{
// Relies on setting up the data table with column names matching the database columns.
bulkCopy.ColumnMappings.Add(columnName, columnName);
}
return bulkCopy;
}
As listed above
Those first 3 results all make sense to me.
#1 vs #2: As long as the inserts all work, the Transactions don't do much. The DB is still doing all the same work, at the same points in time.
#2 vs #3: This was the entire point of running the inserts in parallel. By running the inserts in parallel, we spend less time waiting around for SQL to do it's thing. We're making the DB do a lot of work in parallel, so it's not as much as a 6-fold speed up, but it's still plenty.
Why is the last case so slow? And can I fix it?
That's almost as slow as doing it in series, and fully 25% slower than doing it in parallel, but with multiple transactions!
What's going on?
Why does running multiple SqlBulkCopy inserts, against unrelated tables, async
& in parallel, on a single Transaction seem to behave as though it's running in series instead?
SqlBulkCopy Multiple Tables Insert under single Transaction OR Bulk Insert Operation between Entity Framework and Classic Ado.net (Isn't running the queries in parallel)
Using SqlBulkCopy in one transaction for multiple, related tables (Tables are related and they're trying to read back out of them)
Parallel Bulk Inserting with SqlBulkCopy and Azure (that's talking about parallel load into a single table)
Upvotes: 3
Views: 2643
Reputation: 46241
The only way to execute multiple commands concurrently on the same SQL Server connection/transaction is using Multiple Active Result Sets (MARS). MARS is used in the parallel single transaction case because you're using the same connection/transaction for each parallel bulk copy.
MARS executes SELECT and insert bulk operations as interleaved, not parallel, so you'll get about the same performance as serial execution. You need a distributed transaction with different connections for true parallel execution within the same transaction scope.
Upvotes: 6