Reputation: 1529
I am trying to achieve near maximum memory bandwidth on my system where theoretical maximum bandwidth is 25.5GB/s running with one DDR channel and 4 cores.
I tried running following strees-ng benchmark:
./stress-ng --taskset 0xf --memrate 1 --memrate-wr-mbs 50000 --memrate-rd-mbs 30000 -t 60
But I see maximum bandwidth is around 11000MB/s that is less than 50% of total maximum bandwidth.
Also, I see this blog about achieving maximum bandwidth:
https://codearcana.com/posts/2013/05/18/achieving-maximum-memory-bandwidth.html:
void write_memory_rep_stosq(void* buffer, size_t size) {
// size in bytes, assumed to be a multiple of 8
asm("cld\n" // usually unnecessary, compilers keep DF=0
"rep stosq"
: : "D" (buffer), "c" (size / 8), "a" (0) );
// dangerously buggy: missing "memory" clobber
// and telling the compiler RDI and RCX are pure inputs, not "+D" / "+c"
}
And when I run, I get results that are really close to the peak bandwidth, thanks to modern x86 features like ERMSB handling this with optimized microcode.
$ ./memory_profiler
write_memory_rep_stosq: 20.60 GiB/s
But this is for x86_64, is there any such equivalent instruction for ARM64 ?
Upvotes: 0
Views: 479