Reputation: 23
This is a piece of my code, which calculate the differentiate. It works correctly but it takes a lot (because of height and width).
I am looking to speed up this code.
I also tried with using parallel, for but it didn't work correct (error with out of bounds).
private float[,] Differentiate(int[,] Data, int[,] Filter)
{
int i, j, k, l, Fh, Fw;
Fw = Filter.GetLength(0);
Fh = Filter.GetLength(1);
float sum = 0;
float[,] Output = new float[Width, Height];
for (i = Fw / 2; i <= (Width - Fw / 2) - 1; i++)
{
for (j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
sum=0;
for(k = -Fw/2; k <= Fw/2; k++)
{
for(l = -Fh/2; l <= Fh/2; l++)
{
sum = sum + Data[i+k, j+l] * Filter[Fw/2+k, Fh/2+l];
}
}
Output[i,j] = sum;
}
}
return Output;
}
Upvotes: 0
Views: 860
Reputation: 12619
For parallel execution you need to drop c language like variable declaration at the beginning of method and declare them in actual scope that they are used so they are not shared between threads. Making it parallel should provide some benefit for performance, but making them all ParallerFors is not a good idea as there is a limit for threads amount that actually can run in parallel. I would try to make it with top level loop only:
private static float[,] Differentiate(int[,] Data, int[,] Filter)
{
var Fw = Filter.GetLength(0);
var Fh = Filter.GetLength(1);
float[,] Output = new float[Width, Height];
Parallel.For(Fw / 2, Width - Fw / 2 - 1, (i, state) =>
{
for (var j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
var sum = 0;
for (var k = -Fw / 2; k <= Fw / 2; k++)
{
for (var l = -Fh / 2; l <= Fh / 2; l++)
{
sum = sum + Data[i + k, j + l] * Filter[Fw / 2 + k, Fh / 2 + l];
}
}
Output[i, j] = sum;
}
});
return Output;
}
Upvotes: 3
Reputation: 57
This is a perfect example of a task where using the GPU is better than using the CPU. A GPU is able to perform trillions of floating point operations per second (TFlops), while CPU performance is still measured in GFlops. The catch is that it's only any good if you use SIMD instructions (Single Instruction Multiple Data). The GPU excels at data-parallel tasks. If different data needs different instructions, using the GPU has no advantage.
In your program, the elements of your bitmap go through the same calculations: the same computations just with slightly different data (SIMD!). So using the GPU is a great option. This won't be too complex because with your calculations threads on the GPU would not need to exchange information, nor would they be dependent on results of previous iterations (Each element would be processed by a different thread on the GPU).
You can use, for example, OpenCL to easily access the GPU. More on OpenCL and using the GPU here: https://www.codeproject.com/Articles/502829/GPGPU-image-processing-basics-using-OpenCL-NET
Upvotes: 2