Pier Giorgio Misley
Pier Giorgio Misley

Reputation: 5351

how to remove nested foreach loops for performance emprovement

I have a performance-based question. Is there a way to remove the nested foreach loops replacing them with something more performant ? Here is an example:

List<foo> foos = SelectAllfoos();

foreach(foo f in foos){
    //dosomething

    foreach(foo2 f2 in foo.GetFoos2()){
        //dosomething
    }

    foreach(foo3 f3 in foo.GetFoos3()){
        //dosomething
    }

    foreach(foo4 f4 in foo.GetFoos4()){
        //dosomething

        foreach(foo4_1 f4_1 in f4.GetFoos4_1()){
            //dosomething
        }
    }
}

Obiouvsly it is a fake code I just invented for this example. But imagine you have something like that. How should you improve this method's performances?

PS: I already tried using System.Threading.Task.Parallel.ForEachand it improve performance, but I mean a better way to write this code.

PPS: this is written in C#, but my question regards a wider scope, something useful in all languages.

Upvotes: 1

Views: 1203

Answers (1)

user4842163
user4842163

Reputation:

Since the question is rather general and only focused on loops which provide no information about the actual work being done, I can only provide a general answer.

The last thing you typically want to focus on are the loop mechanics themselves. These often yield little, if any, impact.

Typically if you have this kind of situation where algorithmic improvements are out (ex: sequential loops that cannot do better than linear-time complexity as they require traversing and doing something with every single element no matter what), then the two biggest improvements will often come from parallelization and memory optimization.

The latter one is unfortunately less discussed, especially in higher-level languages, but often carries just as much or more impact. It can improve execution times by orders of magnitudes, and is applicable regardless of the language. Concepts like cache efficiency are not language-dependent concepts, as the hardware remains the same no matter what programming language we use (though how we achieve it can vary considerably between languages).

Memory Access Patterns

For example, take an image processing algorithm. In that case, given two otherwise identical machine instructions (except for the fact that they are swapped), a memory access pattern accessing pixels one horizontal scanline at a time in the outer loop can significantly outperform a memory access pattern that accesses pixels one vertical column of pixels at a time. This would be true even with otherwise identical machine instructions that have the same total instruction-level cost (though instruction costs are variable), but merely access memory in a swapped order.

It's because, put crudely, computers fetch data from slower forms of memory into faster forms of memory in contiguous chunks (pages, cache lines). When you access pixels of an image horizontally, an adjacent, horizontal chunk of pixels might be fetched from a slower form of memory into a faster form, and you end up accessing all the neighboring pixels from the faster form of memory prior to moving on to the next series of pixels. When you access pixels of an image in a vertical fashion, you end up loading horizontal neighboring pixels into a faster form of memory only to use one pixel from that column. The result can significantly slow down the resulting image algorithm as a result of cache misses, since we're failing to use all the data available when it's loaded into a smaller but faster form of memory prior to it being evicted (we're basically wasting a lot of the benefits of that smaller but faster memory).

So typically if you want to make loops go faster, and algorithmic improvements are out, you want to analyze the way that memory is being accessed and potentially change even the memory layout of the data structures involved. Computers like it when you access contiguous data close together in memory, and don't like it so much when you're accessing memory in a chaotic way that's going all over the place. They like arrays which pack their memory contents tightly together a lot more than linked structures which scatter the memory all over the place (unless the linked structures or their memory allocators are carefully designed not to do that). Speedy loops don't come from changing the mechanics of the loop so much as what the loops are doing, but deeper than algorithmic improvements and perhaps even parallelization are those memory-related optimizations coming from a data-oriented design mindset. In languages like C#, one of the techniques to get better locality of reference out of your data structures is object pooling.

Loop Tiling/Blocking

Occasionally there are opportunities where you can improve the memory access patterns by simply changing the way you loop over the data without actually changing the way the data is represented. One such example is loop tiling (aka loop blocking): https://software.intel.com/en-us/articles/how-to-use-loop-blocking-to-optimize-memory-use-on-32-bit-intel-architecture. But again, here the speedup isn't coming from optimizing how you write the loop, per se, but optimizing the way you traverse the data in a way that exploits locality of reference. It's still entirely about memory access.

Profiling

All of these micro-level optimization techniques have a tendency to make your code harder to maintain, so they're almost always best applied in hindsight with plenty of profiling measurements in your hand. The first thing to learn about optimization in general is how to measure, to do it based on hard data rather than hunches. Beginners tend to want to optimize more, not less, because they're doing it based on guesses about what might be inefficient instead of hard data and proper measurements. It's easy to do this for glaring algorithmic bottlenecks, but anything else typically demands a profiler in your hand. A good optimizer is a sniper dispatching hotspots, not a grenadier blindly hurling grenades at anything that might slow things down. In fact, knowing how to prioritize optimizations properly and to make the proper measurements is probably even more important than understanding the inner workings of the machine. So probably beyond all this stuff, if you want to make your loops go faster, first grab a profiler and learn how to measure inefficiencies properly. The first thing to ask is not how to make things faster so much as what actually needs to be faster (and just as importantly if not more, what doesn't).

Upvotes: 2

Related Questions