Reputation: 15
In my program i want to detemine how many numbers have 9 digits how, many have 8 digits, etc. with this loop:
for (int i = 0; i < 60000000; i++)
{
if (a[i] >= 1000000000) { p[10] += 1; }
else if (a[i] >= 100000000) { p[9] += 1; }
else if (a[i] >= 10000000) { p[8] += 1; }
else if (a[i] >= 1000000) { p[7] += 1; }
else if (a[i] >= 100000) { p[6] += 1; }
else if (a[i] >= 10000) { p[5] += 1; }
else if (a[i] >= 1000) { p[4] += 1; }
else if (a[i] >= 100) { p[3] += 1; }
else if (a[i] >= 10) { p[2] += 1; }
else { p[1] += 1; }
}
I parallelized the loop like this:
void partiton(int f, int l, int[] p)
{
Parallel.Invoke(()=>calc(f,l/2,p),()=>calc(l/2,l,p));
}
void calc(int f, int l, int[] p)
{
for (int i = f; i < l; i++)
{
if (a[i] >= 1000000000) { p[10] += 1; }
else if (a[i] >= 100000000) { p[9] += 1; }
else if (a[i] >= 10000000) { p[8] += 1; }
else if (a[i] >= 1000000) { p[7] += 1; }
else if (a[i] >= 100000) { p[6] += 1; }
else if (a[i] >= 10000) { p[5] += 1; }
else if (a[i] >= 1000) { p[4] += 1; }
else if (a[i] >= 100) { p[3] += 1; }
else if (a[i] >= 10) { p[2] += 1; }
else { p[1] += 1; }
}
}
private void button1_Click(object sender, EventArgs e)
{
Stopwatch w = new Stopwatch();
w.Restart();
int f = 0;
int l = 60000000;
Parallel.Invoke(() => calc(f, l/2, p), () => calc(l/2, l, p));
w.Stop();
label1.Text = w.Elapsed.ToString();
}
But the benchmark is: Sequential : 0.3834 Parallel : 0.6864
Why is the parallel code slower? Is there something wrong with my code? My cpu is AMD Phenom™ II X4. Model, 955.
Upvotes: 0
Views: 411
Reputation: 5953
It's all in the variables.
Take, for instance, your p
object. You pass the same p
object to both threads. Now, I'm not sure if Parallel.Invoke
is capable of detecting this, and as such is executing them in serial (albeit with significant overhead) or not, but I do know that if it is not detecting this, then you have a lot of attempts to read/write the same value in the same thread.
Now, I built a small, concrete example using your code as a base, and here's a copy of it. (Paste into any new console project, rename _Main
to Main
and run as you see fit.)
static int[] a = new int[100000000];
static void calc(int f, int l, int[] p, int[] a)
{
for (int i = f; i < l; i++)
{
if (a[i] >= 1000000000) { p[10] += 1; }
else if (a[i] >= 100000000) { p[9] += 1; }
else if (a[i] >= 10000000) { p[8] += 1; }
else if (a[i] >= 1000000) { p[7] += 1; }
else if (a[i] >= 100000) { p[6] += 1; }
else if (a[i] >= 10000) { p[5] += 1; }
else if (a[i] >= 1000) { p[4] += 1; }
else if (a[i] >= 100) { p[3] += 1; }
else if (a[i] >= 10) { p[2] += 1; }
else { p[1] += 1; }
}
}
public static void _Main(string[] args)
{
for (int i = 0; i < a.Length; i++)
{
a[i] = i;
}
int f = 0;
int l = a.Length;
int[] p = new int[10];
int[] p1 = new int[10];
int[] p2 = new int[10];
int[] p3 = new int[10];
int[] p4 = new int[10];
int[] a1 = new int[l / 2];
int[] a2 = new int[l / 2];
int[] a11 = new int[l / 4];
int[] a12 = new int[l / 4];
int[] a13 = new int[l / 4];
int[] a14 = new int[l / 4];
for (int i = 0; i < a.Length; i++)
if (i >= l / 2)
a2[i - l / 2] = a[i];
else
a1[i] = a[i];
for (int i = 0; i < a.Length; i++)
if (i >= l / 4 * 3)
a14[i - l / 4 * 3] = a[i];
else if (i >= l / 4 * 2)
a13[i - l / 4 * 2] = a[i];
else if (i >= l / 4 * 1)
a12[i - l / 4] = a[i];
else
a14[i] = a[i];
int rc = 5;
for (int d = 0; d < rc; d++)
{
Stopwatch w = new Stopwatch();
w.Start();
Parallel.Invoke(() => calc(f, l / 2, p1, a1), () => calc(f, l / 2, p2, a2));
w.Stop();
Console.WriteLine("Attempt {0}/{1}: {2}", 1, d, w.ElapsedMilliseconds);
w.Reset();
w.Start();
Parallel.Invoke(() => calc(f, l / 4, p1, a11), () => calc(f, l / 4, p2, a12), () => calc(f, l / 4, p3, a13), () => calc(f, l / 4, p4, a14));
w.Stop();
Console.WriteLine("Attempt {0}/{1}: {2}", 2, d, w.ElapsedMilliseconds);
w.Reset();
w.Start();
Parallel.Invoke(() => calc(f, l / 2, p, a), () => calc(l / 2, l, p, a));
w.Stop();
Console.WriteLine("Attempt {0}/{1}: {2}", 3, d, w.ElapsedMilliseconds);
w.Reset();
w.Start();
calc(f, l, p, a);
w.Stop();
Console.WriteLine("Attempt {0}/{1}: {2}", 4, d, w.ElapsedMilliseconds);
}
}
I'm sure there are more optimizations I could run. (Converting the if
s to while
loops, for example.) I also cannot guarantee the accuracy of it. I merely took your logic and adopted proper debugging to it.
But when I ran this exact example on my PC, I had the following results:
a
and p
variables into two separate variables for each.a
and p
variables into four separate variables for each.calc
method.calc
method synchronously.Why so big a difference? Attempts 1 and 2 split the processed data into variables that do not require threads to synchronize, whereas Attempt 3 forces the two threads to use the same variables, creating clashes, and as Ron Beyer said, context switches.
Basically, if you are going to try to write to the same anything in parallel, you should localize the data each thread is writing and merge changes in the end.
Upvotes: 4
Reputation: 646
When you try to make an algorithm multithreaded, you need to try to separate the tasks the way they don't use shared memory. As a kind of micro optimization - which is BAD - you can try to allocate the memory the way that they are not next to each other, otherwise the before mentioned cache problem will slow down the processing.
Upvotes: 1