Reputation:
I'm sure everyone knowing golang
knows that blog post here.
Reading it again, I wondered if using gccgo
instead of go build
would increase the speed a bit more. In my typical use case (scientific computations), a gccgo
-generated binary is always faster than a go build
-generated one.
So, just grab this file: havlak6.go and compile it:
go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go
Surprise !
$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU
$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU
I'm curious and want to know why an "optimizing" compiler does produce slower code.
I tried to use gprof
on gccgo
generated binary:
gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out
with no luck:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
As you can see the code has not been actually profiled.
Of course, I read this, but as you can see, the program takes 10+ seconds to execute... The number of samples should be > 1000.
I also tried:
rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof
No success neither.
Do you know what's wrong? Do you have an idea of why gccgo
, with all its optimization routines fails to be faster than gc
in this case?
go
version: 1.0.2
gcc
version: 4.7.2
EDIT:
Oh, I completely forgot to mention... I obviously tried pprof on the gccgo
-generated binary... Here is a top10
:
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 1143 samples
1143 100.0% 100.0% 1143 100.0% 0x00007fbfb04cf1f4
0 0.0% 100.0% 890 77.9% 0x00007fbfaf81101e
0 0.0% 100.0% 4 0.3% 0x00007fbfaf8deb64
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2faf
0 0.0% 100.0% 3 0.3% 0x00007fbfaf8f2fc5
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fc9
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fd6
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f2fdf
0 0.0% 100.0% 2 0.2% 0x00007fbfaf8f4a2f
0 0.0% 100.0% 1 0.1% 0x00007fbfaf8f4a33
And that's why I'm looking for something else.
EDIT2:
Since it seems that someone wants my question to be closed, I did not try to use gprof
out of the blue: https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ
Upvotes: 6
Views: 2415
Reputation: 1
Remember go build
also defaults to static linking so for an apples to apples comparison you should give gccgo the -static
or -static-libgo
option.
Upvotes: 0
Reputation:
Running the gccgo-generated binary under Valgrind seems to indicate that gccgo
has an inefficient memory allocator. This may be one of the reasons why gccgo
4.7.2 is slower than go
1.0.2. It is impossible to run a binary generated by go 1.0.2 under Valgrind, so it is hard to confirm for a fact whether memory allocation is gccgo's primary performance problem in this case.
Upvotes: 2