Benjol
Benjol

Reputation: 66551

'Streaming' data into Sql server

I'm working on a project where we're receiving data from multiple sources, that needs to be saved into various tables in our database.

Fast.

I've played with various methods, and the fastest I've found so far is using a collection of TableValue parameters, filling them up and periodically sending them to the database via a corresponding collection of stored procedures.

The results are quite satisfying. However, looking at disk usage (% Idle Time in Perfmon), I can see that the disk is getting periodically 'thrashed' (a 'spike' down to 0% every 13-18 seconds), whilst in between the %Idle time is around 90%. I've tried varying the 'batch' size, but it doesn't have an enormous influence.

  1. Should I be able to get better throughput by (somehow) avoiding the spikes while decreasing the overall idle time?
  2. What are some things I should be looking out to work out where the spiking is happening? (The database is in Simple recovery mode, and pre-sized to 'big', so it's not the log file growing)
  3. Bonus: I've seen other questions referring to 'streaming' data into the database, but this seems to involve having a Stream from another database (last section here). Is there any way I could shoe-horn 'pushed' data into that?

enter image description here

Upvotes: 1

Views: 6337

Answers (3)

Benjol
Benjol

Reputation: 66551

Building on the code referred to in alzaimar's answer, I've got a proof of concept working with IObservable (just to see if I can). It seems to work ok. I just need to put together some tidier code to see if this is actually any faster than what I already have.

(The following code only really makes sense in the context of the test program in code download in the aforementioned article.)

Warning: NSFW, copy/paste at your peril!

private static void InsertDataUsingObservableBulkCopy(IEnumerable<Person> people, 
                                                      SqlConnection connection)
{
    var sub = new Subject<Person>();

    var bulkCopy = new SqlBulkCopy(connection);
    bulkCopy.DestinationTableName = "Person";
    bulkCopy.ColumnMappings.Add("Name", "Name");
    bulkCopy.ColumnMappings.Add("DateOfBirth", "DateOfBirth");

    using(var dataReader = new ObjectDataReader<Person>(people))
    {
        var task = Task.Factory.StartNew(() =>
        {
            bulkCopy.WriteToServer(dataReader);
        });
        var stopwatch = Stopwatch.StartNew();
        foreach(var person in people) sub.OnNext(person);
        sub.OnCompleted();
        task.Wait();
        Console.WriteLine("Observable Bulk copy: {0}ms",
                           stopwatch.ElapsedMilliseconds);
    }
}

Upvotes: 1

alzaimar
alzaimar

Reputation: 4622

A very easy way of inserting loads of data into an SQL-Server is -as mentioned- the 'bulk insert' method. ADO.NET offers a very easy way of doing this without the need of external files. Here's the code

var bulkCopy = new SqlBulkCopy(myConnection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer (myDataSet);

That's easy.

But: myDataSet needs to have exactly the same structure as MyTable, i.e. Names, field types and order of fields must be exactly the same. If not, well there's a solution to that. It's column mapping. And this is even easier to do:

bulkCopy.ColumnMappings.Add("ColumnNameOfDataSet", "ColumnNameOfTable");

That's still easy.

But: myDataSet needs to fit into memory. If not, things become a bit more tricky as we have need a IDataReader derivate which allows us to instantiate it with an IEnumerable.

You might get all the information you need in this article.

Upvotes: 1

TheCodeKing
TheCodeKing

Reputation: 19220

It's difficult to comment without knowing the specifics, but one of the fastest ways to get data into SQL Server is Bulk Insert from a file.

You could write the incoming data to a temp file and periodically bulk insert it.

Streaming data into SQL Server Table-Valued parameter also looks like a good solution for fast inserts as they are held in memory. In answer to your question, yes you could use this, you just need to turn your data into a IDataReader. There's various ways to do this, from a DataTable for example see here.

If your disk is a bottleneck you could always optimise your infrastructure. Put database on a RAM disk or SSD for example.

Upvotes: 0

Related Questions