Pure.Krome
Pure.Krome

Reputation: 86957

How to avoid .NET Connection Pool timeouts when inserting 37k rows

I'm trying to figure out the best way to batch insert about 37k rows into my Sql Server using DAPPER.

My problem is that when I use Parallel.ForEach - the number of connections to the database increases over a short period of time - finally hitting nearly or about 100 ... which gives connection pool errors. If I force the max degree of parall then it's hit that max number and stays there.

Setting the maxdegree feels wrong.

It currently is doing about 10-20 inserts a second. This is also in a simple Console App - so there's no other database activity besides what's happening in my Parallel.ForEach loop.

Is using Parallel.ForEach the incorrect thing in this case because this is not-CPU bound?

Should I be using async/await ? If so, what stopping this from doing hundreds of db calls in one go?

Sample code which is basically what I'm doing.

var items = GetItemsFromSomewhere(); // Returns 37K items.

Parallel.ForEach(items => item)
{
    using (var sqlConnection = new SqlConnection(_connectionString))
    {
        var result = sqlConnection.Execute(myQuery, new { ... } );
    }
}

My (incorrect) understanding of this was that there should on be about 8 or so connections at any time to the db. The Connection Pool will release the connection (which remains instantiated in the Connection Pool, waiting to be used). And if the Execute takes .. i donno .. lets say even a 1 second (the longest running time for an insert was about 500ms .. and that's 1 in every 100 or so) ... that's ok .. that thread is blocked and chills until the Execute completes. Then the scope completes (and Dispose is auto called) and the connection closed. With the connection closed, the Parallel.ForEach then grabs the next item in the collection, goes to the connection pool and then grabs a spare connection (remember - we just closed one, a split second ago) ... rinse.repeat.

Is this wrong?

Notes:

Upvotes: 0

Views: 1868

Answers (2)

Martin Mulder
Martin Mulder

Reputation: 12954

First of all: If it is about performance, use SqlBulkCopy. This works with SQL-Server. If you are using other database servers, they might have their own SqlBulkCopy-solution (Oracle has one).

SqlBulkCopy works like a bulk-select: One state opens one connection and streams all the data from the server to the client. With an insert, it works the other way arround: It streams all the new records from the client to the server.

See: https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx

If you insist of using parallellism, you might want to consider the follow code:

void BulkInsert<T>(object p)
{
    IEnumerator<T> e = (IEnumerator<T>)p;
    using (var sqlConnection = new SqlConnection(_connectionString))
    {
        while(true)
        {
            T item;
            lock(e)
            {
                if (!e.MoveNext())
                    return;
                item = e.Current;
            }
            var result = sqlConnection.Execute(myQuery, new { ... } );
        }
    }
}

Now create your own threads and invoke this method on these threads with one and the same parameter: The iterator which runs through your collection. Each threat opens its own connection once, starts inserting, and after all items are inserted, the connection is closed. This solutions uses as many connections as your created threads.

PS: Multiple variants of above code are possible . You could call it from background threads, from Tasks, etc. I hope you get the point.

Upvotes: 1

Jeff
Jeff

Reputation: 850

You should use SqlBulkCopy instead of inserting one by one. Faster and more efficient.

https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx

credits to the answer owner Sql Bulk Copy/Insert in C#

Upvotes: 0

Related Questions