FIre Panda
FIre Panda

Reputation: 6637

Thread vs Parallel.For performance

I am struggling to understand the difference between threads and Parallel.For. I created two functions, one used Parallel.For other invoked threads. Invoking 10 threads would appear to be faster, can anyone please explain? Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 10, x =>
    {
        Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
            Thread.CurrentThread.ManagedThreadId));
        Thread.Sleep(3000);
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    for (int i = 0; i < 10; i++)
    {
        Thread t = new Thread(new ThreadStart(Thread1));
        t.Start();
        if (i == 9)
            t.Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1()
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", 0,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

When called below methods, Parallel.For took twice time then threads.

Algo.ParallelThread(); //took 3 secs
Algo.ParallelProcess();  //took 6 secs

Upvotes: 2

Views: 994

Answers (4)

user1451111
user1451111

Reputation: 1943

To put it in the simplest of the simplest terms, using Thread class guarantees to create a thread on the operating system level but using the Parallel.For the CLR thinks twice before spawning the OS-level threads. If it feels that it is a good time to create thread on OS-level, it goes ahead, otherwise it employs the available Thread pool. TPL is written to be optimized with a multi-core environment.

Upvotes: 0

Enigmativity
Enigmativity

Reputation: 117064

You've got a bunch of things here that are going wrong.

(1) Don't use sw.Elapsed.Seconds this value is an int and (obviously) truncates the fractional part of the time. Worse though, if you have a process that takes 61 seconds to complete this will report 1 as it's like the second hand on a clock. You should instead use sw.Elapsed.TotalSeconds which reports as a double and it shows the total number of seconds regardless how many minutes or hours, etc.

(2) Parallel.For uses the thread-pool. This significantly reduces (or even eliminates) the overhead for creating threads. Each time you call new Thread(() => ...) you are allocating over 1MB of RAM and chewing up valuable resources before any processing can take place.

(3) You're artificially loading up the threads with Thread.Sleep(3000); and this means you are overshadowing the actual time it takes to create threads with a massive sleep.

(4) Parallel.For is, by default, limited by the number of cores on your CPU. So when you run 10 threads the work is being cut in to two steps - meaning that the Thread.Sleep(3000); is being run twice in series, hence the 6 seconds that it's running. The new Thread approach is running all of the threads in one go meaning that it takes just over 3 seconds, but again, the Thread.Sleep(3000); is swamping the thread start up time.

(5) You're also dealing with a CLR JIT issue. The first time you run your code the start-up costs are enormous. Let's change the code to remove the sleeps and to properly join the threads:

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 10, x =>
    {
        Console.WriteLine(string.Format("Printing {0} thread = {1}", x, Thread.CurrentThread.ManagedThreadId));
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = Enumerable.Range(0, 10).Select(x => new Thread(new ThreadStart(Thread1))).ToList();
    foreach (var thread in threads) thread.Start();
    foreach (var thread in threads) thread.Join();
    sw.Stop();

    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));

    return true;
}

private static void Thread1()
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", 0, Thread.CurrentThread.ManagedThreadId));  
}

Now, to get rid of the CLR/JIT start up time, let's run the code like this:

ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();   

The times we get are like this:

Time in secs 3.8617
Time in secs 4.7719
Time in secs 0.3633
Time in secs 1.6332
Time in secs 0.3551
Time in secs 1.6148

The starting run times are terrible compared to the second and third runs that are far more consistent.

The result is that running Parallel.For is 4 to 5 times faster than calling new Thread.

Upvotes: 3

Saeb Amini
Saeb Amini

Reputation: 24400

Parallel utilizes however many threads the underlying scheduler provides, which would be the minimum number of threadpool threads to start with.

The number of minimum threadpool threads is by default set to the number of processors. As time goes on and based on many different factors, e.g. all current threads being busy, the scheduler might decide to spawn more threads and go higher than the minimum count.

All of that is managed for you to stop unnecessary resource usage. Your second example circumvents all that by spawning threads manually. If you explicitly set the number of threadpool threads e.g. ThreadPool.SetMinThreads(100, 100), you'll see even the Parallel one takes 3 seconds as it immediately has more threads available to use.

Upvotes: 3

Theraot
Theraot

Reputation: 40180

Your snippets are not equivalent. Here is a version of ParallelThread that would do the same as ParallelProcess but starting new threads:

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = new Thread[10];
    for (int i = 0; i < 10; i++)
    {
        int x = i;
        threads[i] = new Thread(() => Thread1(x));
        threads[i].Start();
    }
    for (int i = 0; i < 10; i++)
    {
        threads[i].Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

Here, I am making sure to wait for all the threads. And also, I making sure to match the console output. Things that OP code does not do.

However, the time difference is still there.

Let me tell you what makes the difference, at least in my tests: the order. Run ParallelProcess before ParallelThread and they should both take 3 seconds to complete (ignoring the initial runs, which will take longer because of compilation). I cannot really explain it.

We could modify the above code futher to use the ThreadPool, and that did also result in ParallelProcess completing in 3 seconds (even though I did not modify that version). This is the version of ParallelThread with ThreadPool I came up with:

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var events = new ManualResetEvent[10];
        for (int i = 0; i < 10; i++)
    {
        int x = i;
        events[x] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem
            (
                _ =>
                {
                    Thread1(x);
                    events[x].Set();
                }
            );
    }
    for (int i = 0; i < 10; i++)
    {
        events[i].WaitOne();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

Note: We could use WaitAll on the events, but that would fail on a STAThread.


You have Thread.Sleep(3000) which are the 3 seconds we see. Meaning that we are not really measuring the overhead of any of these methods.

So, I decided I want to study this futher, and to do it, I went up one order of magnitud (from 10 to 100) and removed the Console.WriteLine (which is introducing synchronization anyway).

This is my code listing:

void Main()
{
    ParallelThread();
    ParallelProcess();
}

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 100, x =>
    {
        /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
            Thread.CurrentThread.ManagedThreadId));*/
        Thread.Sleep(3000);
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var events = new ManualResetEvent[100];
        for (int i = 0; i < 100; i++)
    {
        int x = i;
        events[x] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem
            (
                _ =>
                {
                    Thread1(x);
                    events[x].Set();
                }
            );
    }
    for (int i = 0; i < 100; i++)
    {
        events[i].WaitOne();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));*/
    Thread.Sleep(3000);
}

I am getting 6 seconds for ParallelThread and 9 seconds for ParallelProcess. This remains true even after reversing the order. Which makes me much more confident that this is a real measure of the overhead.

Adding ThreadPool.SetMinThreads(100, 100); bring the time back down to 3 seconds, for both ParallelThread (remember that this version is using the ThreadPool) and ParallelProcess. Meaning that this overhead comes from the thread pool. Now, I can go back to the version that spawns new threads (modified to spawn 100 and with Console.WriteLine commented):

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = new Thread[100];
    for (int i = 0; i < 100; i++)
    {
        int x = i;
        threads[i] = new Thread(() => Thread1(x));
        threads[i].Start();
    }
    for (int i = 0; i < 100; i++)
    {
        threads[i].Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));*/
    Thread.Sleep(3000);
}

I get consistent 3 seconds from this version (meaning the time overhead is negligible, since, as I said earlier, Thread.Sleep(3000) is 3 seconds), however I want to note that it would be leaving more garbage to collect than using the ThreadPool or Parallel.For. On the other hand, using Parallel.For remains tied to the ThreadPool. By the way, if you want to degrade its performance, reducing the minimun number of threads is not enough, you got to degreade the maximun number of threads too (e.g. ThreadPool.SetMaxThreads(1, 1);).

All in all, please notice that Parallel.For is easier to use, and harder to wrong.


Invoking 10 threads would appear to be faster, can anyone please explain?

Spawning threads is fast. Although, it will leade to more garbage. Also, note that your test is not great.

Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?

Yes, they would. They map to the underlaying operating system threads, can be preempted by it, and will run in any core according to their affinity (see ProcessThread.ProcessorAffinity). To be clear, they are not fibers nor coroutines.

Upvotes: 1

Related Questions