Parallel.ForEach behaving like a regular for each towards the end of the iteration

Question

I am having this issue when I ran something like this:

Parallel.ForEach(dataTable.AsEnumerable(), row =>
{
   //do processing
}

Assuming that there are 500+ records say 870. Once the Parallel.ForEach completes 850, it seems to be running sequentially i.e. only 1 operation at a time. It completed 850 operations very fast but when it comes close to the end of the iteration it becomes very slow and seems to be performing like a regular for each. I even tried for 2000 records.

Is something wrong in my code? Please give suggestions.

Below is the code I am using

Sorry I just posted the wrong example. This is the correct code:

Task newTask = Task.Factory.StartNew(() =>
{
    Parallel.ForEach(dtResult.AsEnumerable(), dr =>
    {
        string extractQuery = "";
        string downLoadFileFullName = "";
        lock (foreachObject)
        {

            string fileName = extractorConfig.EncodeFileName(dr);
            extractQuery = extractorConfig.GetExtractQuery(dr);
            if (string.IsNullOrEmpty(extractQuery)) throw new Exception("Extract Query not found. Please check the configuration");

            string newDownLoadPath = CommonUtil.GetFormalizedDataPath(sDownLoadPath, uKey.CobDate);
            //create folder if it doesn't exist
            if (!Directory.Exists(newDownLoadPath)) Directory.CreateDirectory(newDownLoadPath);
            downLoadFileFullName = Path.Combine(newDownLoadPath, fileName);
        }
        Interlocked.Increment(ref index);

        ExtractorClass util = new ExtractorClass(SourceDbConnStr);
        util.LoadToFile(extractQuery, downLoadFileFullName);
        Interlocked.Increment(ref uiTimerIndex);
    });
});

Tim Lloyd · Accepted Answer

My guess:

This looks to have a high degree of potential IO from:

Database+Disk
Network communication to DB and back
Writing results to disk

Therefore a lot of time is going to be spent waiting for IO. My guess is that the waiting is only getting worse as more threads are being added to the mix and IO is being further stressed. For instance a disk only has one set of heads, so you cannot write to it concurrently. If you have a large number of threads trying to write concurrently, performance degrades.

Try limiting the maximum number of threads you are using:

var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(dtResult.AsEnumerable(), options, dr =>
{
    //Do stuff
});

Update

After your code edit, I would suggest the following which has a couple of changes:

Reduce maximum number of threads - this can be experimented with.
Only perform directory check and creation once.

Code:

private static bool isDirectoryCreated;

//...

var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(dtResult.AsEnumerable(), options, dr =>
{
    string fileName, extractQuery, newDownLoadPath;

    lock (foreachObject)
    {
        fileName = extractorConfig.EncodeFileName(dr);

        extractQuery = extractorConfig.GetExtractQuery(dr);

        if (string.IsNullOrEmpty(extractQuery))
            throw new Exception("Extract Query not found. Please check the configuration");

        newDownLoadPath = CommonUtil.GetFormalizedDataPath(sDownLoadPath, uKey.CobDate);

        if (!isDirectoryCreated)
        {
            if (!Directory.Exists(newDownLoadPath))
                Directory.CreateDirectory(newDownLoadPath);

            isDirectoryCreated = true;
        }
    }

    string downLoadFileFullName = Path.Combine(newDownLoadPath, fileName);

    Interlocked.Increment(ref index);

    ExtractorClass util = new ExtractorClass(SourceDbConnStr);
    util.LoadToFile(extractQuery, downLoadFileFullName);

    Interlocked.Increment(ref uiTimerIndex);
});

Parallel.ForEach behaving like a regular for each towards the end of the iteration

Answers (2)

Related Questions