Cache-Misses in OSU Unidirectional Bandwidth Test

Question

We are measuring the cache-misses in MPI_Isend()+MPI_Waitall() at the Sender and MPI_Irecv()+MPI_Waitall() at the receiver for the OSU unidrectional bandwidth benchmark. Suprisingly, the cache-misses at 8KB message size exceed that at the 16KB message size when using Intel MPI 2017.4 and Intel compiler 2017.4. We capture cache-misses using the profiler TAU. Below are some graphs which capture this strange behaviour:

We have ruled out the effect of Eager-Rendezvous switchover as the default I_MPI_EAGER_THRESHOLD is 256 KB and the re-allocation of pre-registered buffers as the message size is less than the default size of pre-registered buffers (which is around 23.5 KB for Intel MPI).

Any insight will be valuable. Thanks !

Addition 1: The communication is between nodes which are connected with Intel OPA having a maximum bandwidth of 100 Gbits/sec.

Addition 2: We suspect that the default Path MTU size of 8 KB for Intel OPA has "something" to do with it but still the questions remains as to why cache-misses for a message of size 16 KB is less than that at 8 KB.

Cache-Misses in OSU Unidirectional Bandwidth Test

Answers (0)

Related Questions