Reputation: 811
I am running an image analysis code on an array storing information about the image. Unfortunately the code is very heavy and takes an average of 25s to run through a single frame. The main problem I see is the array addressing. Which is the fastest to run through a 2d array and are there at all any differences in
horizontal then vertical
for (int y = 0; y < array.Length; ++y)
for (int x = 0; x < array[].Length; ++x)
//Code using array[y][x]
and vertical then horrizontal?
for (int x = 0; x < array[].Length; ++x)
for (int y = 0; y < array.Length; ++y)
//Code using array[y][x]
Furthermore, I tried to avoid direct addressing and use pointers instead.
for (int y = 0; y < array.Length; ++y)
int* ptrArray = (int*)array[0];
for (int x = 0; x < array[].Length; ++x, ++ptrArray)
//Code using ptrArray for array[y][x]
or
for (int x = 0; x < array[].Length; ++x)
int* ptrArray = (int*)array[0];
for (int y = 0; y < array.Length; ++y, ptrArray += array[].Length)
//Code using ptrArray for array[y][x]
Any help is greatly appreciated. Max
Upvotes: 13
Views: 9539
Reputation: 113242
The most important rule is that it's all theory until you profile. I don't hold with those who insist that profiling is everything (without some theory you're no better than a Cargo Cultist putting coconuts on their ears and waiting for the plane to come) but your theory can always be wrong or incomplete, so profiling is crucial.
Generally, we want the inner-scan to be horizontally (in terms of the array, rather than the image, though for most formats that's the same). The reason being that with an array like:
00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
It is going to be laid out as:
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
You want to be scanning along contiguous blocks that can be loaded into CPU caches and then used entirely, rather than scanning from block to block and needing to regularly change the CPU cache contents.
This is even more important if you try to parallelise the algorithm. You want each thread dealing with its own contiguous blocks of memory as far as both input and output goes, rather than not only suffering the way single-threaded code does with poor cache-hit-frequence but also causing each other's buffers to be dirtied and need refreshing. This can be the difference between parallelising leading to a speed boost and parallelising actually slowing things down.
Another thing is the difference between a 2-dimensional array byte[,]
rather than an array of arrays byte[][]
, which your comment in your question "array[y][x]" makes me wonder if perhaps you are using. With the former to obtain arr[1,2] the logic is:
With the latter, the logic is:
There is also less good memory cache-hit-frequence. The latter has benefits when "jagged" structures are needed, but that is not the case here. 2D is almost always faster than array of arrays.
Things I don't see as likely to help, but I would certainly try them in your situation:
You may find a boost from doing your 1d <=> 2d logic. Have a single-dimension array where idx = y * width + x. It shouldn't make an appreciable difference, but it's worth trying.
Optimisations do try to both hoist calls to .Length
and omit needless bounds checking, so you may find manually hoisting and switching to pointer arithmetic doesn't gain anything, but in a case where you really do need to bring time down it is certainly worth profiling.
Finally. Have you profiled how fast your code is at scanning the array and doing nothing? It could be that another part of the code is the real bottleneck, and you're fixing the wrong thing.
Upvotes: 3
Reputation:
Always make sure that your innermost loop accesses contiguous memory.
This is usually the row of your image. Note, that in rectangular arrays, you should make this the last index: array[y,x]
.
this paper suggests that the built-in C# rectangular arrays (with the multiple indexes) are quite slow. I read this before, but this is the only reference I got. I would start with a linear array, and calculate an offset once for each row. Unmanaged will only help you in really trivial cases.
If a single frame takes 25 seconds, then it is either huuuuge, or you do very complex processing. In that case it is only interesting to spend effort on optimizing memory access if you access many input pixels for each output pixel.
Upvotes: 0
Reputation: 417
If it is possible, try to reallocate your array so that first dimension is less than the second one. It would speed up things drastically. Another solution is to reallocate the data in a single-dimension array as was proposed above.
Upvotes: 0
Reputation: 62093
Can you goy unsafe? Pointer. THe problem with array is that you STILL have the boundary checks on every access. Pointers remove that. Note tha this is fully C# supported - but you need to put it into an unsafe block. It also means you must be ABLE to run unsafe code, which is not always a given.
http://msdn.microsoft.com/en-us/library/28k1s2k6.aspx
has a code sample.
Upvotes: 0
Reputation: 59101
I have no idea, but you've already come up with the examples. So you could run your code samples in a loop and profile it yourself.
var sw = new Stopwatch();
sw.Start();
ExecuteMyCode();
sw.Stop();
Console.WriteLine("Time: " + sw.Elapsed);
You might be able to speed up your processing by using a multi-threading construct like Parallel.ForEach
. This would work well if the code in your loop avoids dependencies between loop iterations.
Upvotes: 1
Reputation: 25619
One option is to use reverse looping (start your for() loop
from array.Length
down to 0)
That'll speed things up abit.
for example,
for (int x = array[].Length-1; x >= 0; --x)
int* ptrArray = (int*)array[0];
for (int y = array.Length-1; y >= 0 ; --y, ptrArray += array[].Length)
//Code using ptrArray for array[y][x]
Upvotes: 3