Reputation: 839
I am having this issue when I ran something like this:
Parallel.ForEach(dataTable.AsEnumerable(), row =>
{
//do processing
}
Assuming that there are 500+ records say 870. Once the Parallel.ForEach completes 850, it seems to be running sequentially i.e. only 1 operation at a time. It completed 850 operations very fast but when it comes close to the end of the iteration it becomes very slow and seems to be performing like a regular for each. I even tried for 2000 records.
Is something wrong in my code? Please give suggestions.
Below is the code I am using
Sorry I just posted the wrong example. This is the correct code:
Task newTask = Task.Factory.StartNew(() =>
{
Parallel.ForEach(dtResult.AsEnumerable(), dr =>
{
string extractQuery = "";
string downLoadFileFullName = "";
lock (foreachObject)
{
string fileName = extractorConfig.EncodeFileName(dr);
extractQuery = extractorConfig.GetExtractQuery(dr);
if (string.IsNullOrEmpty(extractQuery)) throw new Exception("Extract Query not found. Please check the configuration");
string newDownLoadPath = CommonUtil.GetFormalizedDataPath(sDownLoadPath, uKey.CobDate);
//create folder if it doesn't exist
if (!Directory.Exists(newDownLoadPath)) Directory.CreateDirectory(newDownLoadPath);
downLoadFileFullName = Path.Combine(newDownLoadPath, fileName);
}
Interlocked.Increment(ref index);
ExtractorClass util = new ExtractorClass(SourceDbConnStr);
util.LoadToFile(extractQuery, downLoadFileFullName);
Interlocked.Increment(ref uiTimerIndex);
});
});
Upvotes: 3
Views: 4819
Reputation: 38434
My guess:
This looks to have a high degree of potential IO from:
Therefore a lot of time is going to be spent waiting for IO. My guess is that the waiting is only getting worse as more threads are being added to the mix and IO is being further stressed. For instance a disk only has one set of heads, so you cannot write to it concurrently. If you have a large number of threads trying to write concurrently, performance degrades.
Try limiting the maximum number of threads you are using:
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
Parallel.ForEach(dtResult.AsEnumerable(), options, dr =>
{
//Do stuff
});
Update
After your code edit, I would suggest the following which has a couple of changes:
Code:
private static bool isDirectoryCreated;
//...
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
Parallel.ForEach(dtResult.AsEnumerable(), options, dr =>
{
string fileName, extractQuery, newDownLoadPath;
lock (foreachObject)
{
fileName = extractorConfig.EncodeFileName(dr);
extractQuery = extractorConfig.GetExtractQuery(dr);
if (string.IsNullOrEmpty(extractQuery))
throw new Exception("Extract Query not found. Please check the configuration");
newDownLoadPath = CommonUtil.GetFormalizedDataPath(sDownLoadPath, uKey.CobDate);
if (!isDirectoryCreated)
{
if (!Directory.Exists(newDownLoadPath))
Directory.CreateDirectory(newDownLoadPath);
isDirectoryCreated = true;
}
}
string downLoadFileFullName = Path.Combine(newDownLoadPath, fileName);
Interlocked.Increment(ref index);
ExtractorClass util = new ExtractorClass(SourceDbConnStr);
util.LoadToFile(extractQuery, downLoadFileFullName);
Interlocked.Increment(ref uiTimerIndex);
});
Upvotes: 3
Reputation: 545588
It’s hard to give details without the relevant code but in general this is the expected behaviour. .NET tries to schedule the tasks such that every processor is evenly busy.
But this can only ever be approximated sind not all of the tasks take the same amount of time. At the end some processors will be done working and some won’t, and re-distributing the work is costly and not always beneficial.
I don’t know details about the load balancing used by PLinq but the bottom line is that this behaviour can never be fully prevented.
Upvotes: 2