Reputation: 1
I maintain C# multi-process application which has one scheduler process to assign actions and other N worker processes to execute them. Actions are independent of each other, so each worker's process doesn't communicate to others. Number N is configurable and I keep it equal to the number of machine's CPU cores.
The questions are:
1 - Would the program work faster if I assign each process to the core by ProcessorAffinity property?
2 - If I have 2 NUMA nodes, how do I need to configure worker processes to reduce the NUMA cores group to access memory of each other?
I know that OS forces processes to always execute in a zone, and always allocate memory from the same zone, but I'm not sure if it would be better to add restrictions explicily.
Upvotes: -1
Views: 65
Reputation: 9650
I am not an expert so maybe some others with deeper knowledge can add more, but I do have some experience so I will try and share it with you.
NUMA sensitivity
First, I would question if you are really sure that your process is NUMA sensitive?
In the vast majority of cases, processes are not NUMA sensitive so then any optimisation is pointless.
Each application run is likely to vary slightly and will always be impacted by other processes running on the machine. So you would really need to do extensive testing to show that your app is sensitive to the NUMA and more importantly that it would make a significant enough difference to justify putting effort into adapting this.
O/S Scheduler
The second thing is do you really want to try and control this on the program and in code. The O/S scheduler does it's own thing and this is often likely to override what you try to do, or cause other problems as it assigns or has assigned other processes to the cores you wish to use. And different O/Ses behave in different ways. So if you are targeting more than one O/S (or different versions) then it might behave differently.
This could be further complicated if you are using Kubernetes as there are actually various versions used in the wild. e.g. each cloud vendor has their own slightly changed version which may or may not behave differently.
K8s can even use alternative schedulers which would also likely impact this, although this then opens up an opportunity for you to create your own to try and control it.
CPU Architecture
The third thing is that all of this is impacted by the architecture that you are running on. An Intel Sapphire-rapids processors is different to Emerald rapids, and AMD Epyc Genoa is different to Turin, and any AMD is certainly different to Intel. They have different number of NUMAs and Sub-NUMAs and work differently. And the O/Ses also behave differently on the different architectures.
So if you plan on running on different architectures, then it becomes very difficult to optimise for a large range.
Our experience
Want I can say is that our system consists > 200 individual PODs of which only about 2 or 3 are NUMA sensitive.
Each individual POD is relatively small in terms of individual resource requirements, which helps deployment and reduces the likelihood of NUMA problems. A single POD with large resource requirements is more likely to have issues.
So we don't try to manipulate it in code, but rather we control the POD deployment with the Scheduler and the HELM file config. We try to focus only on one version of Kubernetes and try to limit the CPU architecture as well. By ensuring that K8s deploys the POD in the correct (for us) manner then we can avoid NUMA problems and we understand the performance of the applications better.
I hope this helped on some level.
Upvotes: 3