Reputation: 11
This may be an open question, but I will start from scala language. Scala is in favors of async programming. Under scala's future is ExecutionContext, which we can consider either forkJoinPool's task or ThreadPool's thread. Which means some code in different context actually executes in same thread, but in the same call stack, code will be broken into pieces into different threads. As we all know, modern CPU all have L1/L2/L3 cache, if the code can leverage L1/L2/L3 cache will be faster than reading from main memory. But as the asynchronous programming, same context will be in different threads, the thread may/may not execute in the same process, the code can't use cache in the different Future, now the question is asynchronous programming can execute code more efficiently by breaking a long call into small pieces but it costs the benefit for reading the code from CPU cache. Is it good thing or bad thing, or my understanding is totally wrong.
Upvotes: 0
Views: 142
Reputation: 27421
You are right that this is too broad for this forum, but here are some comments.
Code does not execute more efficiently if it is broken into smaller pieces. It is always less efficient to do this, though it does allow more parallelism and therefore may execute faster on multi-core processors. The main reason to break code into threads is to reduce latency and to allow different parts of a program to operate independently for better separation of concerns. It can also improve performance when accessing slow devices, but this is mostly dealt with by the OS anyway.
You are right about the potential cost of moving threads between cores, but the scheduler is well aware of cache performance issues and will aim to keep threads on the same core or core group if possible (again, it depends on the processor and memory architecture).
You mention reading code from the cache, but it is usually the data accesses that put the most pressure on the cache and memory system. This is one area where functional code can help because it does it tends to read from one part of memory and write to another part, which is usually more efficient that reading and writing the same parts of memory.
Scala programs cannot directly "leverage" the cache, and it is dangerous to try unless you know an awful lot about the processor in question. Even if you manage to make the code perform particularly well on one processor it is unlikely to work well on a different processor, and especially a different architecture.
As will all these kinds of issues, there are some basic rules that will make code run more or less efficiently, but performance optimisation should be done very carefully and concentrated on the areas of code that can be proven to be critical to the performance of the program.
Upvotes: 2