Reputation: 6637
I am struggling to understand the difference between threads and Parallel.For. I created two functions, one used Parallel.For other invoked threads. Invoking 10 threads would appear to be faster, can anyone please explain? Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 10, x =>
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10; i++)
{
Thread t = new Thread(new ThreadStart(Thread1));
t.Start();
if (i == 9)
t.Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1()
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", 0,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
When called below methods, Parallel.For took twice time then threads.
Algo.ParallelThread(); //took 3 secs
Algo.ParallelProcess(); //took 6 secs
Upvotes: 2
Views: 994
Reputation: 1943
To put it in the simplest of the simplest terms, using Thread
class guarantees to create a thread on the operating system level but using the Parallel.For
the CLR thinks twice before spawning the OS-level threads. If it feels that it is a good time to create thread on OS-level, it goes ahead, otherwise it employs the available Thread pool. TPL is written to be optimized with a multi-core environment.
Upvotes: 0
Reputation: 117064
You've got a bunch of things here that are going wrong.
(1) Don't use sw.Elapsed.Seconds
this value is an int
and (obviously) truncates the fractional part of the time. Worse though, if you have a process that takes 61 seconds to complete this will report 1
as it's like the second hand on a clock. You should instead use sw.Elapsed.TotalSeconds
which reports as a double
and it shows the total number of seconds regardless how many minutes or hours, etc.
(2) Parallel.For
uses the thread-pool. This significantly reduces (or even eliminates) the overhead for creating threads. Each time you call new Thread(() => ...)
you are allocating over 1MB of RAM and chewing up valuable resources before any processing can take place.
(3) You're artificially loading up the threads with Thread.Sleep(3000);
and this means you are overshadowing the actual time it takes to create threads with a massive sleep.
(4) Parallel.For
is, by default, limited by the number of cores on your CPU. So when you run 10 threads the work is being cut in to two steps - meaning that the Thread.Sleep(3000);
is being run twice in series, hence the 6 seconds that it's running. The new Thread
approach is running all of the threads in one go meaning that it takes just over 3 seconds, but again, the Thread.Sleep(3000);
is swamping the thread start up time.
(5) You're also dealing with a CLR JIT issue. The first time you run your code the start-up costs are enormous. Let's change the code to remove the sleeps and to properly join the threads:
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 10, x =>
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x, Thread.CurrentThread.ManagedThreadId));
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = Enumerable.Range(0, 10).Select(x => new Thread(new ThreadStart(Thread1))).ToList();
foreach (var thread in threads) thread.Start();
foreach (var thread in threads) thread.Join();
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));
return true;
}
private static void Thread1()
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", 0, Thread.CurrentThread.ManagedThreadId));
}
Now, to get rid of the CLR/JIT start up time, let's run the code like this:
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
The times we get are like this:
Time in secs 3.8617 Time in secs 4.7719 Time in secs 0.3633 Time in secs 1.6332 Time in secs 0.3551 Time in secs 1.6148
The starting run times are terrible compared to the second and third runs that are far more consistent.
The result is that running Parallel.For
is 4 to 5 times faster than calling new Thread
.
Upvotes: 3
Reputation: 24400
Parallel
utilizes however many threads the underlying scheduler provides, which would be the minimum number of threadpool threads to start with.
The number of minimum threadpool threads is by default set to the number of processors. As time goes on and based on many different factors, e.g. all current threads being busy, the scheduler might decide to spawn more threads and go higher than the minimum count.
All of that is managed for you to stop unnecessary resource usage. Your second example circumvents all that by spawning threads manually. If you explicitly set the number of threadpool threads e.g. ThreadPool.SetMinThreads(100, 100)
, you'll see even the Parallel one takes 3 seconds as it immediately has more threads available to use.
Upvotes: 3
Reputation: 40180
Your snippets are not equivalent. Here is a version of ParallelThread
that would do the same as ParallelProcess
but starting new threads:
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = new Thread[10];
for (int i = 0; i < 10; i++)
{
int x = i;
threads[i] = new Thread(() => Thread1(x));
threads[i].Start();
}
for (int i = 0; i < 10; i++)
{
threads[i].Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
Here, I am making sure to wait for all the threads. And also, I making sure to match the console output. Things that OP code does not do.
However, the time difference is still there.
Let me tell you what makes the difference, at least in my tests: the order. Run ParallelProcess
before ParallelThread
and they should both take 3 seconds to complete (ignoring the initial runs, which will take longer because of compilation). I cannot really explain it.
We could modify the above code futher to use the ThreadPool
, and that did also result in ParallelProcess
completing in 3 seconds (even though I did not modify that version). This is the version of ParallelThread
with ThreadPool
I came up with:
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var events = new ManualResetEvent[10];
for (int i = 0; i < 10; i++)
{
int x = i;
events[x] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem
(
_ =>
{
Thread1(x);
events[x].Set();
}
);
}
for (int i = 0; i < 10; i++)
{
events[i].WaitOne();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
Note: We could use WaitAll
on the events, but that would fail on a STAThread
.
You have Thread.Sleep(3000)
which are the 3 seconds we see. Meaning that we are not really measuring the overhead of any of these methods.
So, I decided I want to study this futher, and to do it, I went up one order of magnitud (from 10 to 100) and removed the Console.WriteLine
(which is introducing synchronization anyway).
This is my code listing:
void Main()
{
ParallelThread();
ParallelProcess();
}
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 100, x =>
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var events = new ManualResetEvent[100];
for (int i = 0; i < 100; i++)
{
int x = i;
events[x] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem
(
_ =>
{
Thread1(x);
events[x].Set();
}
);
}
for (int i = 0; i < 100; i++)
{
events[i].WaitOne();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
}
I am getting 6 seconds for ParallelThread
and 9 seconds for ParallelProcess
. This remains true even after reversing the order. Which makes me much more confident that this is a real measure of the overhead.
Adding ThreadPool.SetMinThreads(100, 100);
bring the time back down to 3 seconds, for both ParallelThread
(remember that this version is using the ThreadPool
) and ParallelProcess
. Meaning that this overhead comes from the thread pool. Now, I can go back to the version that spawns new threads (modified to spawn 100 and with Console.WriteLine
commented):
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = new Thread[100];
for (int i = 0; i < 100; i++)
{
int x = i;
threads[i] = new Thread(() => Thread1(x));
threads[i].Start();
}
for (int i = 0; i < 100; i++)
{
threads[i].Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
}
I get consistent 3 seconds from this version (meaning the time overhead is negligible, since, as I said earlier, Thread.Sleep(3000)
is 3 seconds), however I want to note that it would be leaving more garbage to collect than using the ThreadPool
or Parallel.For
. On the other hand, using Parallel.For
remains tied to the ThreadPool
. By the way, if you want to degrade its performance, reducing the minimun number of threads is not enough, you got to degreade the maximun number of threads too (e.g. ThreadPool.SetMaxThreads(1, 1);
).
All in all, please notice that Parallel.For
is easier to use, and harder to wrong.
Invoking 10 threads would appear to be faster, can anyone please explain?
Spawning threads is fast. Although, it will leade to more garbage. Also, note that your test is not great.
Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?
Yes, they would. They map to the underlaying operating system threads, can be preempted by it, and will run in any core according to their affinity (see ProcessThread.ProcessorAffinity
). To be clear, they are not fibers nor coroutines.
Upvotes: 1