Reputation: 41
I am working on a project where I am to extract information continually from multiple servers (fewer than 1000) and write most of the information into a database. I've narrowed down my choices to 2:
Edit: This is a client, so I will be generating the connections and requesting information periodically.
1 - Using the asynchronous approach, create N sockets to poll, decide whether the information will be written into the database on the callback and put the useful information into a buffer. Then write the information from the buffer using a timer.
2 - Using the multithreading approach, create N threads with one socket per thread. The buffer of the useful information would remain on the main thread and so would the cyclic writing.
Both options use in fact multiple threads, only the second one seems to add an extra difficulty of creating each of the threads manually. Are there any merits to it? Is the writing by using a timer wise?
Upvotes: 4
Views: 1793
Reputation: 133995
Another option is to have a dedicated thread whose only job is to service the buffer and write data to the database as fast as it can. So when you make a connection and get data, that data is placed in the buffer. But you have one thread that's always looking at the buffer and writing data to the database as it comes in from the other connections.
Create the buffer as a BlockingCollection< T >. Use asynchronous requests as suggested in a previous answer. And have a single dedicated thread that reads the data and writes it to the database:
BlockingCollection<DataType> _theQueue = new BlockingCollection<DataType>(MaxBufferSize);
// add data with
_theQueue.Add(Dataitem);
// service the queue with a simple loop
foreach (var dataItem in _theQueue.GetConsumingEnumerable())
{
// write dataItem to the database
}
When you want to shut down (i.e. no more data is being read from the servers), you mark the queue as complete for adding. The consumer thread will then empty the queue, note that it's marked as complete for adding, and the loop will exit.
// mark the queue as complete for adding
_theQueue.CompleteAdding();
You need to make the buffer large enough to handle bursts of information.
If writing one record at a time to the database isn't fast enough, you can modify the consumer loop to fill its own internal buffer with some number of records (10? 100? 1000?), and write them to the database all in one shot. How you do that will depend of course on your server. But you should be able to come up with some form of bulk insert that will reduce the number of round trips you make to the database.
Upvotes: 1
Reputation: 171178
With 1000 connections async IO is usually a good idea because it does not block threads while the IO is in progress. (It does not even use a background thread to wait.) That makes (1) the better alternative.
It is not clear from the question what you would need a timer for. Maybe for buffering writes? That would be valid but it seems to have nothing to do with the question.
Polling has no place in a modern async IO application. The system calls your callback (or completes your IO Task
) when it is done. The callback is queued to the thread-pool. This allows you to not worry about that. It just happens.
The code that reads data should look like this:
while (true) {
var msg = await ReadMessageAsync(socket);
if (msg == null) break;
await WriteDataAsync(msg);
}
Very simple. No blocking of threads. No callbacks.
Upvotes: 6
Reputation: 6222
In answer to the "is using a timer wise" question, perhaps it is better to make your buffer autoflush when it reaches either a certain time, or a certain size. This is the way the in-memory cache works in the .NET framework. The cache is set to both a maximum size and a maximum stale-ness.
Resiliancy on failure might be a concern, as well as the possibility that peak loads might blow your buffer if its an in-memory one. You might consider making your buffer local but persistent - for instance using a MSMQ or similar high speed queue technology. I've seen this done successfully, especially if you make the buffer write async (i.e. "fire and forget") it has almost no impact on the ability to service the input queue, and allows the database population code to pull from the persistent buffer(s) whenever it needs to or whenever prompted to.
Upvotes: 1
Reputation: 13745
For option (1) you could write qualifying information to a queue and then listen on the queue with your database writer. This will allow your database some breathing space during peak loads and avoid the requests backing up waiting for a timer.
A persistent queue would give you some resilience too.
Upvotes: 0