Reputation: 10740
So recently I learned about the perf
command in linux. I decided to run some experiments, so I created an empty c program and measured how many instructions it took to run:
echo 'int main(){}'>emptyprogram.c && gcc -O3 emptyprogram.c -o empty
perf stat ./empty
This was the output:
Performance counter stats for './empty':
0.341833 task-clock (msec) # 0.678 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
112 page-faults # 0.328 M/sec
1,187,561 cycles # 3.474 GHz
1,550,924 instructions # 1.31 insn per cycle
293,281 branches # 857.966 M/sec
4,942 branch-misses # 1.69% of all branches
0.000504121 seconds time elapsed
Why is it using so many instructions to run a program that does literally nothing? I thought that maybe this was some baseline number of instructions that are necessary to load a program into the OS, so I looked for a minimal executable written in assembly, and I found a 142 byte executable that outputs "Hi World"
here (http://timelessname.com/elfbin/)
Running perf stat on the 142 byte hello executable, I get:
Hi World
Performance counter stats for './hello':
0.069185 task-clock (msec) # 0.203 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
3 page-faults # 0.043 M/sec
126,942 cycles # 1.835 GHz
116,492 instructions # 0.92 insn per cycle
15,585 branches # 225.266 M/sec
1,008 branch-misses # 6.47% of all branches
0.000340627 seconds time elapsed
This still seems a lot higher than I'd expect, but we can accept it as a baseline. In that case, why did running empty
take 10x more instructions? What did those instructions do? And if they're some sort of overhead, why is there so much variation in overhead between a C program and the helloworld assembly program?
Upvotes: 5
Views: 935
Reputation: 20450
It's hardly fair to claim that it "does literally nothing". Yes, at the app level you chose to make the whole thing a giant no-op for your microbenchmark, that's fine. But no, down beneath the covers at the system level, it's hardly "nothing". You asked linux to fork off a brand new execution environment, initialize it, and connect it to the environment. You called very few glibc functions, but dynamic linking is non-trivial and after a million instructions your process was ready to demand fault printf() and friends, and to efficiently bring in libs you might have linked against or dlopen()'ed.
This is not the sort of microbench that implementors are likely to optimize against. What would be of interest is if you can identify "expensive" aspects of fork/exec that in some use cases are never used, and so might be #ifdef'd out (or have their execution short circuited) in very specific situations. Lazy evaluation of resolv.conf is one example of that, where the overhead is never paid by a process if it never interacts with IP servers.
Upvotes: 2