Reputation:
I am trying to write a faster code for sobel but I could not understand to use it for several for loop?
Should I use as many parallel for as the number of loops?
Is this get effective?
Can somebody explain it on codes: Here is the codes:
for (int y = 0; y < Image.Height; y++)
{
for (int x = 0; x < Image.Width * 3; x += 3)
{
r_x = g_x = b_x = 0; //reset the gradients in x-direcion values
r_y = g_y = b_y = 0; //reset the gradients in y-direction values
location = x + y * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride
for (int yy = -(int)Math.Floor(weights_y.GetLength(0) / 2.0d), yyy = 0; yy <= (int)Math.Floor(weights_y.GetLength(0) / 2.0d); yy++, yyy++)
{
if (y + yy >= 0 && y + yy < Image.Height) //to prevent crossing the bounds of the array
{
for (int xx = -(int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3, xxx = 0; xx <= (int)Math.Floor(weights_x.GetLength(1) / 2.0d) * 3; xx += 3, xxx++)
{
if (x + xx >= 0 && x + xx <= Image.Width * 3 - 3) //to prevent crossing the bounds of the array
{
location2 = x + xx + (yy + y) * ImageData.Stride; //to get the location of any pixel >> location = x + y * Stride
sbyte weight_x = weights_x[yyy, xxx];
sbyte weight_y = weights_y[yyy, xxx];
//applying the same weight to all channels
b_x += buffer[location2] * weight_x;
g_x += buffer[location2 + 1] * weight_x; //G_X
r_x += buffer[location2 + 2] * weight_x;
b_y += buffer[location2] * weight_y;
g_y += buffer[location2 + 1] * weight_y;//G_Y
r_y += buffer[location2 + 2] * weight_y;
}
}
}
}
//getting the magnitude for each channel
b = (int)Math.Sqrt(Math.Pow(b_x, 2) + Math.Pow(b_y, 2));
g = (int)Math.Sqrt(Math.Pow(g_x, 2) + Math.Pow(g_y, 2));//G
r = (int)Math.Sqrt(Math.Pow(r_x, 2) + Math.Pow(r_y, 2));
if (b > 255) b = 255;
if (g > 255) g = 255;
if (r > 255) r = 255;
//getting grayscale value
grayscale = (b + g + r) / 3;
//thresholding to clean up the background
//if (grayscale < 80) grayscale = 0;
buffer2[location] = (byte)grayscale;
buffer2[location + 1] = (byte)grayscale;
buffer2[location + 2] = (byte)grayscale;
//thresholding to clean up the background
//if (b < 100) b = 0;
//if (g < 100) g = 0;
//if (r < 100) r = 0;
//buffer2[location] = (byte)b;
//buffer2[location + 1] = (byte)g;
//buffer2[location + 2] = (byte)r;
}
}
Upvotes: 2
Views: 128
Reputation: 1063058
The most important questions are: is the work trivially parallelizable, and does the object model you're using support concurrency. Things that are purely math related and where the outcomes aren't cumulative tend to be parallelizable, but I can't comment on the object model's thread-safety. It isn't guaranteed (and the default is usually "no").
As for where:
There is very little point having nested parallelism; parallelism has overheads, and magnifying those overheads is counter-productive. The most effective way to treat parallelism is to think "chunky" - i.e. a relatively small number of non-trivial operations (but hopefully at least as many as available CPU cores), rather than huge numbers of trivial operations. As such, the most effective place to put parallelism is usually: the outermost loop. In this case, this seems to map to rows in an image, which seems a reasonable way to partition image processing. You could partition it into Nths (for CPU cores N), but honestly: I suspect rows will work just fine, and keeps things simple.
However! Note that you need to avoid shared state: r_x
, g_x
and b_x
and the same for the _y
parts (and any other shared locals) would need to be declared inside the parallel part, to ensure that they are independent. Other things to look at: grayscale
, location
, location2
, r
, g
, b
, yyy
, xxx
. It would be good to see where these things are currently declared, but my suspicion is that they'd all need to be moved so that they're declared inside the parallel portion. Check all locals that are declared, and all fields that are accessed.
It looks like buffer
and buffer2
are simply input/output arrays, in which case: they should work OK in this case.
Upvotes: 4