Multi-threaded degradation of performance with newer versions of g++?

Question

I've written some C++ backpropagation code which I'm running on a i9-9900K in Ubuntu 18.04.

The issue I'm seeing is that I'm getting progressively worse mulithreaded performance with newer versions of g++.

Single threaded benchmarks improve as expected with newer g++ versions:

g++ 4.8: 5437 cycles/s
g++ 5.5: 5929 cycles/s
g++ 6.5: 5932 cycles/s
g++ 7.4: 6117 cycles/s
g++ 8.3: 6921 cycles/s

Multi threaded benchmarks (14 pthreads on 8 cores) degrade significantly with newer versions:

g++ 4.8: 25456 cycles/s
g++ 5.5: 17212 cycles/s
g++ 6.5: 18616 cycles/s
g++ 7.4: 17054 cycles/s
g++ 8.3: 14797 cycles/s

I've seen similar behavior in CentOS 7.6 and Clear Linux as well. Across all tested OS's the fastest performance came from using 14 threads with g++ 4.8.

Here are the compilation flags I'm using: g++ -c -std=c++11 -march=native -Ofast

Am I using the wrong flags for compilation? I’ve tried -O3 and the degradation is similar though less extreme (and slower than -Ofast)

g++ 4.8 -O3: 17256 cycles/s
g++ 5.5 -O3: 15129 cycles/s
g++ 6.5 -O3: 15779 cycles/s
g++ 7.4 -O3: 15736 cycles/s
g++ 8.3 -O3: 13361 cycles/s

I feel like I am running into a memory bandwidth issue with so many cores. Are there any compilation options that can help with the memory pressure from so many threads?

Multi-threaded degradation of performance with newer versions of g++?

Answers (1)

Related Questions