Reputation: 2670
At work one of our processes uses a SQL database table as a queue. I've been designing a queue reader to check the table for queued work, update the row status when work starts, and delete the row when the work is finished. I'm using Parallel.Foreach
to give each process its own thread and setting MaxDegreeOfParallelism
to 4.
When the queue reader starts up it checks for any unfinished work and loads the work into an list, then it does a Concat
with that list and a method that returns an IEnumerable
which runs in an infinite loop checking for new work to do. The idea is that the unfinished work should be processed first and then the new work can be worked as threads are available. However what I'm seeing is that FetchQueuedWork
will change dozens of rows in the queue table to 'Processing' immediately but only work on a few items at a time.
What I expected to happen was that FetchQueuedWork
would only get new work and update the table when a slot opened up in the Parallel.Foreach
. What's really odd to me is that it behaves exactly as I would expect when I run the code in my local developer environment, but in production I get the above problem.
I'm using .Net 4. Here is the code:
public void Go()
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = 4 }, DoWork);
}
private IEnumerable<WorkData> FetchQueuedWork()
{
while (true)
{
var workUnit = WorkData.GetQueuedWorkAndSetStatusToProcessing();
yield return workUnit;
}
}
private void DoWork(WorkData workUnit)
{
if (!workUnit.Loaded)
{
System.Threading.Thread.Sleep(5000);
return;
}
Work();
}
Upvotes: 1
Views: 133
Reputation: 9587
Blorgbeard's solution is correct when it comes to .NET 4.5 - hands down.
If you are constrained to .NET 4, you have a few options:
Replace your Parallel.ForEach
with work.AsParallel().WithDegreeOfParallelism(4).ForAll(DoWork)
. PLINQ is more conservative when it comes to buffering items, so this should do the trick.
Write your own enumerable partitioner (good luck).
(Side-effecting Select
used for the sake of brevity)
public void Go()
{
const int MAX_DEGREE_PARALLELISM = 4;
using (var semaphore = new SemaphoreSlim(MAX_DEGREE_PARALLELISM, MAX_DEGREE_PARALLELISM))
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork
.Concat(FetchQueuedWork())
.Select(w =>
{
// Side-effect: bad practice, but easier
// than writing your own IEnumerable.
semaphore.Wait();
return w;
});
// You still need to specify MaxDegreeOfParallelism
// here so as not to saturate your thread pool when
// Parallel.ForEach's load balancer kicks in.
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = MAX_DEGREE_PARALLELISM }, workUnit =>
{
try
{
this.DoWork(workUnit);
}
finally
{
semaphore.Release();
}
});
}
}
Upvotes: 3
Reputation: 103525
I suspect that the default (Release mode?) behaviour is to buffer the input. You might need to create your own partitioner and pass it the NoBuffering
option:
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
var partitioner = Partitioner.Create(work, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partioner, options, DoWork);
Upvotes: 3