OnesAndZeroes
OnesAndZeroes

Reputation: 335

Multi-threaded degradation of performance with newer versions of g++?

I've written some C++ backpropagation code which I'm running on a i9-9900K in Ubuntu 18.04.

The issue I'm seeing is that I'm getting progressively worse mulithreaded performance with newer versions of g++.

Single threaded benchmarks improve as expected with newer g++ versions:

g++ 4.8: 5437 cycles/s
g++ 5.5: 5929 cycles/s
g++ 6.5: 5932 cycles/s
g++ 7.4: 6117 cycles/s
g++ 8.3: 6921 cycles/s

Multi threaded benchmarks (14 pthreads on 8 cores) degrade significantly with newer versions:

g++ 4.8: 25456 cycles/s
g++ 5.5: 17212 cycles/s
g++ 6.5: 18616 cycles/s
g++ 7.4: 17054 cycles/s
g++ 8.3: 14797 cycles/s

I've seen similar behavior in CentOS 7.6 and Clear Linux as well. Across all tested OS's the fastest performance came from using 14 threads with g++ 4.8.

Here are the compilation flags I'm using: g++ -c -std=c++11 -march=native -Ofast

Am I using the wrong flags for compilation? I’ve tried -O3 and the degradation is similar though less extreme (and slower than -Ofast)

g++ 4.8 -O3: 17256 cycles/s
g++ 5.5 -O3: 15129 cycles/s
g++ 6.5 -O3: 15779 cycles/s
g++ 7.4 -O3: 15736 cycles/s
g++ 8.3 -O3: 13361 cycles/s

I feel like I am running into a memory bandwidth issue with so many cores. Are there any compilation options that can help with the memory pressure from so many threads?

Upvotes: 4

Views: 167

Answers (1)

OnesAndZeroes
OnesAndZeroes

Reputation: 335

Further testing revealed that the issue was related to the -march=native optimization flag.

g++ 4.8 treats the i9-9900k natively as core-avx2 which activates: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL

g++ 4.9 and greater treat the i9-9900k natively as broadwell which activates: MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW

Apparently this somehow results in over-optimization.

Removing the -march flag altogether fixed the issue. Disabling AVX also worked using -mno-avx and -mno-avx2

Upvotes: 1

Related Questions