Reputation: 729
We are measuring the cache-misses in MPI_Isend()+MPI_Waitall()
at the Sender and MPI_Irecv()+MPI_Waitall()
at the receiver for the OSU unidrectional bandwidth benchmark. Suprisingly, the cache-misses at 8KB
message size exceed that at the 16KB
message size when using Intel MPI 2017.4
and Intel compiler 2017.4
. We capture cache-misses using the profiler TAU. Below are some graphs which capture this strange behaviour:
We have ruled out the effect of Eager-Rendezvous switchover as the default I_MPI_EAGER_THRESHOLD
is 256 KB
and the re-allocation of pre-registered buffers as the message size is less than the default size of pre-registered buffers (which is around 23.5 KB for Intel MPI).
Any insight will be valuable. Thanks !
Addition 1: The communication is between nodes which are connected with Intel OPA having a maximum bandwidth of 100 Gbits/sec
.
Addition 2: We suspect that the default Path MTU size of 8 KB
for Intel OPA has "something" to do with it but still the questions remains as to why cache-misses for a message of size 16 KB is less than that at 8 KB.
Upvotes: 0
Views: 63