user1940040
user1940040

Reputation:

Why is gccgo slower than gc in that particular case?

I'm sure everyone knowing golang knows that blog post here.

Reading it again, I wondered if using gccgo instead of go build would increase the speed a bit more. In my typical use case (scientific computations), a gccgo-generated binary is always faster than a go build-generated one.

So, just grab this file: havlak6.go and compile it:

go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go

Surprise !

$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU

$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU

I'm curious and want to know why an "optimizing" compiler does produce slower code.

I tried to use gprof on gccgo generated binary:

gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out

with no luck:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

As you can see the code has not been actually profiled.

Of course, I read this, but as you can see, the program takes 10+ seconds to execute... The number of samples should be > 1000.

I also tried:

rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof

No success neither.

Do you know what's wrong? Do you have an idea of why gccgo, with all its optimization routines fails to be faster than gc in this case?

go version: 1.0.2 gcc version: 4.7.2

EDIT:

Oh, I completely forgot to mention... I obviously tried pprof on the gccgo-generated binary... Here is a top10:

Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 1143 samples
    1143 100.0% 100.0%     1143 100.0% 0x00007fbfb04cf1f4
       0   0.0% 100.0%      890  77.9% 0x00007fbfaf81101e
       0   0.0% 100.0%        4   0.3% 0x00007fbfaf8deb64
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2faf
       0   0.0% 100.0%        3   0.3% 0x00007fbfaf8f2fc5
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fc9
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fd6
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fdf
       0   0.0% 100.0%        2   0.2% 0x00007fbfaf8f4a2f
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f4a33

And that's why I'm looking for something else.

EDIT2:

Since it seems that someone wants my question to be closed, I did not try to use gprof out of the blue: https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ

Upvotes: 6

Views: 2415

Answers (2)

Ted Kotz
Ted Kotz

Reputation: 1

Remember go build also defaults to static linking so for an apples to apples comparison you should give gccgo the -static or -static-libgo option.

Upvotes: 0

user811773
user811773

Reputation:

Running the gccgo-generated binary under Valgrind seems to indicate that gccgo has an inefficient memory allocator. This may be one of the reasons why gccgo 4.7.2 is slower than go 1.0.2. It is impossible to run a binary generated by go 1.0.2 under Valgrind, so it is hard to confirm for a fact whether memory allocation is gccgo's primary performance problem in this case.

Upvotes: 2

Related Questions