Viktor KurDz
Viktor KurDz

Reputation: 41

Increase in time taken to run LINQ query .asparallel when creating new separate tasks in C#

In a LINQ Query, I have used .AsParallel as follows:

var completeReservationItems = from rBase in reservation.AsParallel()
                                       join rRel in relationship.AsParallel() on rBase.GroupCode equals rRel.SourceGroupCode
                                       join rTarget in reservation.AsParallel() on rRel.TargetCode equals rTarget.GroupCode
                                       where rRel.ProgramCode == programCode && rBase.StartDate <= rTarget.StartDate && rBase.EndDate >= rTarget.EndDate
                                       select new Object
                                       {
                                           //Initialize based on the query
                                       };

Then, I have created two separate Tasks and was running them in parallel, passing the same Lists to both the methods as follows:

            Task getS1Status = Task.Factory.StartNew(
            () =>
            {
                RunLinqQuery(params);
            });
        Task getS2Status = Task.Factory.StartNew(
            () =>
            {
                RunLinqQuery(params);
            });

        Task.WaitAll(getS1Status, getS2Status);

I was capturing the timings and was surprised to see that the timings were as follows:

  1. Above scenario: 6 sec (6000 ms)
  2. Same code, running sequentially instead of 2 Tasks: 50 ms
  3. Same code, but without .AsParallel() in the LINQ: 50 ms

I wanted to understand why this is taking so long in the above scenario.

Upvotes: 0

Views: 666

Answers (1)

tym32167
tym32167

Reputation: 4881

Posting this as answer only because I have some code to show.

Firstly, I dont know how many threads will be created with AsParallel(). Documentation dont say anything about it https://msdn.microsoft.com/en-us/library/dd413237(v=vs.110).aspx

Imagine following code

void RunMe()
{
    foreach (var threadId in Enumerable.Range(0, 100)
                            .AsParallel()
                            .Select(x => Thread.CurrentThread.ManagedThreadId)
                            .Distinct())
        Console.WriteLine(threadId);
}

How much thread's ids we will see? For me each time will see different number of threads, example output:

30 // only one thread!

Next time

27 // several threads
13
38
10
43
30

I think, number of threads depends of current scheduler. We can always define maximum number of threads by calling WithDegreeOfParallelism (https://msdn.microsoft.com/en-us/library/dd383719(v=vs.110).aspx) method, example

void RunMe()
{
    foreach (var threadId in Enumerable.Range(0, 100)
                            .AsParallel()
                            .WithDegreeOfParallelism(2)
                            .Select(x => Thread.CurrentThread.ManagedThreadId)
                            .Distinct())
        Console.WriteLine(threadId);
}

Now, output will contains maximum 2 threads.

7
40

Why this important? As I said, number of threads can directly influence on performance. But, this is not all problems. In your 1 scenario, you are creating new tasks (which will perform inside thread pool and can add additional overhead), and then, you are calling Task.WaitAll. Take a look on source code for it https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs,72b6b3fa5eb35695 , Im sure that those for loop by task will add additional overhead, and, in situation when AsParallel will take too much threads inside first task, next task can start continiously. Moreover, this CAN be happen, so, if you will run your 1 scenario 1000 times, probably, you will get very different results.

So, my last argument that you try to measure parallel code, but it is very hard to do it right. Im not recommend to use parallel stuff as much as you can, because it can raise performance degradation, if you dont know exactly, what are you doing.

Upvotes: 1

Related Questions