Reputation: 6019
I enumerate through a bitarray setting every second bit to false.
Now I'd like to speed this up by splitting it up into two threads.. for some weird reason though, the time per Thread to do the HALF amount of work takes 64% MORE time, and I wonder why's that?
Could this be due to some kind of CPU caching effect? How do I do this properly?
I have tried 8 threads too previously with lambda expressions but it was always around ~1400 ms, however in single threading I constandly got 850 ms. Also when I let a single thread do all the work, it took me 830 ms. I just don't understand, anyone knowing the cause for that here?
Code:
class Program
{
static int count = 0x10000000;
static int half = count / 2;
static BitArray bitArray = new BitArray(count);
static unsafe void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
#if SINGLE
for (int i = 0; i < bitArray.Count; i += 2)
bitArray.Set(i, true);
#else
Thread thread1 = new Thread(Thread1);
Thread thread2 = new Thread(Thread2);
thread1.Start();
thread2.Start();
thread1.Join();
thread2.Join();
#endif
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
static void Thread1()
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < half; i += 2)
bitArray.Set(i, true);
sw.Stop();
Console.WriteLine("Thread1: {0}", sw.ElapsedMilliseconds);
}
static void Thread2()
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = half; i < count; i += 2)
bitArray.Set(i, true);
sw.Stop();
Console.WriteLine("Thread2: {0}", sw.ElapsedMilliseconds);
}
}
Upvotes: 3
Views: 1228
Reputation: 16107
I modified your code so that the test runs 10 times and reports the results. Using your code, I see similar timings for the single vs multithreaded tests (each thread is taking around 1200ms).
However, as others have said, your use of a single BitArray from multiple threads is not guaranteed to not cause contention between the threads.
This is most simply demonstrated by giving each thread its own BitArray instead of using a shared static one. With this approach, I typically see each thread taking around 450ms, although occasionally still seeing larger times:
Thread2: 415
Thread1: 420
447
Thread2: 414
Thread1: 420
496
Thread1: 1185
Thread2: 1198
1249
Thread1: 417
Thread2: 421
455
Thread1: 420
Thread2: 415
455
Thread2: 413
Thread1: 417
491
Thread2: 413
Thread1: 417
508
Thread2: 417
Thread1: 441
526
Thread1: 420
Thread2: 415
465
Thread1: 940
Thread2: 1005
1087
Ultimately I think what this is showing is that:
Upvotes: 1
Reputation: 3230
Because threading isn't so easy as it seems.
First of all, as stated by the documentation, BitArrays are NOT thread-safe. This means they might and will behave unpredictably, when used concurrently by multiple threads.
As for the performance penalty, it's probably due to the increased bus traffic, which is caused by your two threads, trying to concurrently modify the same memory locations, continuosly invalidating each others' caches.
Even though you think your threads are not modifying the same locations, that might not be true. For instance, BitArray has got a Count
property. It's very likely that, each time you set a bit to 1, the thread modifies a counter variable, that's backing the Count
property. This concurrent modification is dangerous, due to race conditions and stale values, and might increase the bus traffic, as I described before.
The matter is that a CPU core works at 2-3GHz, while the RAM and the memory bus work at 1Ghz. The ram is much slower, a RAM access requires about 100 CPU cycles. If you break the caching mechanisms of the CPU, it's obvious that the performance will decrease.
EDIT: not to mention that thread creation and joining involves a significant overhead. If your work is 830ms one-shot. It's unlikely that you can obtain significant improvements with multithreading. You should also try to get rid of the Stopwatches in the threads, because they are an overhead, too.
Upvotes: 2
Reputation: 4591
BitArray is not a thread-safe class. You should not use it like that. And in fact, beside correctness this is most probably the cause of slowness. Here's why:
If you look at the source code of BitArray, it contains an int version
field, which is updated at every operation, notably Set()
, which you call.
This means that every thread continuously updates the same memory location, which is a huge perf killer because all cores have to communicate and synchronize when accessing this location. In this condition it makes perfect sense that a multithreaded solution has worse performance than the single core one.
Upvotes: 9