Reputation: 531
I have some working C# code that uses a SqlConnection to create temp tables (e.g., #Foo), call stored procs to fill those temp tables and return results to the C# client, use c# to perform complex calculations on those results, and use the calculation results to update one of the temp tables created earlier.
Because of the temp tables used throughout the process, we must have only one SqlConnection.
I identified a performance bottleneck in updating the temp table with the calculation results. This code was already batching the updates to prevent the C# client from running out of memory. Each batch of calculated data was sent to a stored proc via SqlCommand.ExecuteNonQuery, and the sproc in turn updates the temp table. The code was spending most of its time in this call to ExecuteNonQuery.
So, I changed it to BeginExecuteNonQuery, along with the code to wait on the threads and call EndExecuteNonQuery. This improved performance by about a third, but I am worried about having multiple concurrent calls to SqlCommand.BeginExecuteNonQuery using the same SqlConnection.
Is this OK, or will I run into threading problems?
Sorry for the long explanation.
The MSDN docs state:
The BeginExecuteNonQuery method returns immediately, but until the code executes the corresponding EndExecuteNonQuery method call, it must not execute any other calls that start a synchronous or asynchronous execution against the same SqlCommand object.
This seems to imply that different SqlCommand objects can call BeginExecuteNonQuery before the first SqlCommand completes.
Here is some code that illustrates the issue:
private class SqlCommandData
{
public SqlCommand Command { get; set; }
public IAsyncResult AsyncResult { get; set; }
}
public static void TestMultipleConcurrentBeginExecuteNonQueryCalls(string baseConnectionString)
{
var connectionStringBuilder = new SqlConnectionStringBuilder(baseConnectionString)
{
MultipleActiveResultSets = true,
AsynchronousProcessing = true
};
using (var connection = new SqlConnection(connectionStringBuilder.ConnectionString))
{
connection.Open();
// ELIDED - code that uses connection to do various Sql work
SqlDataReader dataReader = null;
// in real code, this would be initialized from calls to SqlCommand.ExecuteReader, using same connection
var commandDatas = new List<SqlCommandData>();
var count = 0;
const int maxCountPerJob = 10000;
while (dataReader.Read())
{
count++;
// ELIDED - do some calculations on data, too complex to do in SQL stored proc
if (count >= maxCountPerJob)
{
count = 0;
var commandData = new SqlCommandData
{
Command = new SqlCommand {Connection = connection}
};
// ELIDED - other initialization of command - used to send the results of calculation back to DB
commandData.AsyncResult = commandData.Command.BeginExecuteNonQuery();
commandDatas.Add(commandData);
}
}
dataReader.Close();
WaitHandle.WaitAll(commandDatas.Select(c => c.AsyncResult.AsyncWaitHandle).ToArray());
foreach (var commandData in commandDatas)
{
commandData.Command.EndExecuteNonQuery(commandData.AsyncResult);
commandData.Command.Dispose();
}
// ELIDED - more code using same SqlConnection to do final work
connection.Close();
}
}
Upvotes: 30
Views: 17223
Reputation: 92
I'm not sure if this question is still important, but...
Try moving the line -
connection.Open();
down from the beginning of your using, to after the new sqlCommand but before the BeginExecuteNonQuery.. like this -
var commandData = new SqlCommandData
{
Command = new SqlCommand {Connection = connection}
};
connection.Open();
// ELIDED - other initialization of command - used to send the results of calculation back to DB
commandData.AsyncResult = commandData.Command.BeginExecuteNonQuery();
Upvotes: 0
Reputation: 64218
Well, at the extreme risk of receiving a lot of down votes I have to comment on this one. Firstly, this is a good question and well stated to address the specific potential issue you mentioned. However, you have neglected to discuss this "lengthy" process you're trying to accomplish.
My experience has thought me one thing...
If the question your asking is hard to answer, change the question.
Though I know very little of your specific problem, I think this neatly applies to your predicament. As others have mentioned... Temp tables are nasty, creating your own tables for a specific task is nastier still, updating large quantities of data in SQL is expensive.
Ask yourself "Can you avoid it all?"
Quite often people choose to implement extremely complicated logic in databases because they have a belief that SQL can do it faster. Practically this is a flawed concept, Databases are storage/serialization devices, they are good at storing, updating, locating, and synchronizing access to data. They are not well equipped for processing complex operations. Even after Microsoft's (and others) bastardization of the database by injecting full development languages into it, It cannot perform as optimally as a well written client (*depending on the complexity of the operations, which I suspect you have surpassed).
As an example, you have a database of around 2gb worth of raw data. You want to produce a complex report or analysis on the entire data set. Well simply put 2gb of memory is easy to come by, slurp the entire database (or the portion you need) into memory using dictionaries or whatever to create the look-ups you will need. Depending on several factors the whole thing will likely run several times faster than SQL, can easily be unit tested, and will be (IMHO) significantly easier to build, debug, and maintain than a nasty assortment of SPROCs constructing dynamic SQL. Even with more than 2gb of raw data, client caches can easily be created with several existing technologies (B-Trees, ISAM, or the like).
The product I work on today has 2.4tb of data in the database and we have not a single sproc, join statement, or even a non-equality where clause.
But alas my advice may or may not be pertinent to your specific circumstances since I do not know your objectives or constraints. Hopefully, if nothing else, it will make you ask yourself:
"Am I asking the right question?"
Upvotes: 17
Reputation: 14618
There can be only one DataReader associated with a Command object, and there can be numerous Command objects associated with the same connection. The only thing you can't do here is use the same command with different parameters.
However, when you start a database transaction (implicit if not explicit), the resources associated with that transaction are locked until the transaction is committed or rolled back, and all processes who want to query those resources are put in a queue. SQL Server manages queues pretty well. I had a few problems with deadlocks because of high server load in SQL Server 2000, but there were no such problems with later versions.
It is strange that you actually received a performance improvement. This makes me think you have a very large amount of data, which takes time to process when sending to SQL Server. When transmitting chunks, less time is consumed because data transmission and data processing is performed concurrently.
Anyway, there shouldn't be any problem with that.
However, consider using CLR Assemblies (if this option is available) to process the information directly in the database engine, without TCP traffic.
Upvotes: 2
Reputation: 2012
You could use a producer-consumer pattern with 2 threads and 2 simultaneous but independent sql connections.
The producer (1st thread) has the DataReader (1st sql connection) and writes its results to a blocking queue. The consumer (2nd thread) reads from the queue, has the ExecuteNonQuery (2nd sql connection) and writes to the temp table(s).
Another idea in case your ExecuteNonQuery commands are basically multiple INSERTs: ExecuteNonQuery has an overload with a StringCollection to send multiple sql statements as one operation.
Upvotes: 3
Reputation: 3714
This is definitely not safe. It may theoretically work fine forever, but it will always be in danger of failing. Worse, since it is not supported, it could fail in a way that you don't notice, such as returning bad data, instead of throwing an exception.
What MARS allows you to do is execute commands against a connection while in the middle of reading a result set. This is useful, for example, if you want to process each row of a result and send updates back to the database without loading the entire result set into your application first. However, it does NOT allow you to send multiple commands to the same connection simultaneously.
Depending on your needs, you may be able to use MARS appropriately to increase performance without multi-threading. Otherwise, you will need to use multiple connections, which will require use global temp tables or permanent tables. Even then, you will need to be careful to avoid deadlocks, and to design your queries such that locking does not destroy the advantage you are trying to gain from multiple connections.
Upvotes: 0
Reputation: 92
Why not just run both commands from two async thread requests using ExecuteNonQuery instead of BeginExecuteNonquery, and let connection pooling work it out in a more traditional sense? Then just wait on the the threadpool.
Upvotes: 0
Reputation: 364269
That is a question if running data modification statements helps you. MARS is acronym of multiple active result sets - result set is outcome of SELECT
or FETCH
statements and in .NET it generally means that you can have multiple DataReaders opened on the same connection. But any data modification operation is considered as atomic and it must complete before other operation can be executed (or data retrieval from result set can continue) - read about it here. So I think that your asynchronous commands are at the and still executed sequentially.
You can use multiple connections if your main connection creates global temporary table ##TempName
instead of #Temp
. Global temporary table should be visible from other session while the main session is still active.
Upvotes: 1
Reputation: 1941
Yes, really good question.
Perhaps you could use a feature that was introduced in SQL Server 2005, called MARS: http://msdn.microsoft.com/en-us/library/ms345109(v=sql.90).aspx
MARS allows for reusing the same connection for reads and writes, yet it has some limitations, and frankly, I don't know anyone who would use it.
From what I see though, maybe it would be possible to look at your problem from a different point. Maybe, instead of using temp tables and having to keep an eye on them through the whole process, which in the end has to be synchronous - maybe you can create a set of permanent tables which will contain additional column JobId. Then you are not constrained to single thread. You can have a table that will keep history of the jobs. Once you insert a row to this table, you retrieve scope_identity() and add it to all elements of your algorithm. The tables could hold more than one copy of the results at a time and any queries that read or update the data would use the JobId as set identifier. If you index the tables correctly, you will have very smooth design that will be far more scalable than the solution you are trying to implement now.
Regards
Piotr
Upvotes: 1