Reputation: 14869
While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc()
because even in multi-core machines it is shared/synch between all the cores.
I have available a PC with NUMA architecture using Linux and C and I have two questions:
malloc()
execute independently on each core/memory without blocking the other cores?memcpy()
made? Can this be called independently on each core or, calling it in once core will block the others? I maybe wrong but I remember that also memcpy()
got the same problem of malloc()
i.e. when one core is using it the others have to wait.Upvotes: 11
Views: 2445
Reputation: 2813
A NUMA machine is a shared memory system, so memory accesses from any processor can reach the memory without blocking. If the memory model were message based, then accessing remote memory would require the executing processor to request that the local processor perform the desired operation. However, in a NUMA system, a remote processor may still impact the performance of the close processor due to utilizing the memory links, though this can depend on the specific architectural configuration.
As for 1, this entirely depends on the OS and malloc library. The OS is responsible for presenting the per-core / per-processor memory as either a unified space or as NUMA. Malloc may or may not be NUMA-aware. But fundamentally, the malloc implementation may or may not be able to execute concurrently with other requests. And the answer from Al (and associated discussion) addresses this point in greater detail.
As for 2, as memcpy consist of a series of loads and stores, the only impact would again be the potential architectural effects of using the other processors' memory controllers, etc.
Upvotes: 6
Reputation: 583
Upvotes: 2