NUMA systems, virtual pages, and false sharing

Question

As I understand things, for perfromance on NUMA systems, there are two cases to avoid:

threads in the same socket writing to the same cache line (usually 64 bytes)
threads from different sockets writing to the same virtual page (usually 4096 bytes)

A simple example will help. Let's assume I have a two socket sytem and each socket has a CPU with two physical cores (and two logical cores i.e. no Intel hyper-threading or AMD two cores per module). Let me borrow the digram at OpenMP: for schedule

| socket 0    | core 0 | thread 0 |
|             | core 1 | thread 1 |

| socket 1    | core 2 | thread 2 |
|             | core 3 | thread 3 |

So based on case 1 it's best to avoid e.g. thread 0 and thread 1 writing to the same cache line and based on case 2 it's best to avoid e.g. thread 0 writing to the same virtual page as thread 2.

However, I have been informed that on modern processors that the second case is no longer a concern. Threads between sockets can write to the same virtual page efficiently (as long as they don't write to the same cache line).

Is case two no longer a problem? And if it is still a problem what's the correct terminology for this? Is is correct to call both cases a kind of false sharing?

NUMA systems, virtual pages, and false sharing

Answers (1)

Related Questions