Why does it take so many instructions to run an empty program?

Question

So recently I learned about the perf command in linux. I decided to run some experiments, so I created an empty c program and measured how many instructions it took to run:

echo 'int main(){}'>emptyprogram.c && gcc -O3 emptyprogram.c -o empty
perf stat ./empty

This was the output:

 Performance counter stats for './empty':

      0.341833      task-clock (msec)         #    0.678 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
           112      page-faults               #    0.328 M/sec                  
     1,187,561      cycles                    #    3.474 GHz                    
     1,550,924      instructions              #    1.31  insn per cycle         
       293,281      branches                  #  857.966 M/sec                  
         4,942      branch-misses             #    1.69% of all branches        

   0.000504121 seconds time elapsed

Why is it using so many instructions to run a program that does literally nothing? I thought that maybe this was some baseline number of instructions that are necessary to load a program into the OS, so I looked for a minimal executable written in assembly, and I found a 142 byte executable that outputs "Hi World" here (http://timelessname.com/elfbin/)

Running perf stat on the 142 byte hello executable, I get:

Hi World

 Performance counter stats for './hello':

      0.069185      task-clock (msec)         #    0.203 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
             3      page-faults               #    0.043 M/sec                  
       126,942      cycles                    #    1.835 GHz                    
       116,492      instructions              #    0.92  insn per cycle         
        15,585      branches                  #  225.266 M/sec                  
         1,008      branch-misses             #    6.47% of all branches        

   0.000340627 seconds time elapsed

This still seems a lot higher than I'd expect, but we can accept it as a baseline. In that case, why did running empty take 10x more instructions? What did those instructions do? And if they're some sort of overhead, why is there so much variation in overhead between a C program and the helloworld assembly program?

J_H · Accepted Answer

It's hardly fair to claim that it "does literally nothing". Yes, at the app level you chose to make the whole thing a giant no-op for your microbenchmark, that's fine. But no, down beneath the covers at the system level, it's hardly "nothing". You asked linux to fork off a brand new execution environment, initialize it, and connect it to the environment. You called very few glibc functions, but dynamic linking is non-trivial and after a million instructions your process was ready to demand fault printf() and friends, and to efficiently bring in libs you might have linked against or dlopen()'ed.

This is not the sort of microbench that implementors are likely to optimize against. What would be of interest is if you can identify "expensive" aspects of fork/exec that in some use cases are never used, and so might be #ifdef'd out (or have their execution short circuited) in very specific situations. Lazy evaluation of resolv.conf is one example of that, where the overhead is never paid by a process if it never interacts with IP servers.

Why does it take so many instructions to run an empty program?

Answers (1)

Related Questions