Lei Yu
Lei Yu

Reputation: 321

Execution time of c++ code on Linux for the first time is extremly slow

I believe many people have experienced this. It always takes much longer to execute c++ code for the first time on linux.

Like calling ::clock_gettime(CLOCK_REALTIME, &ts); for the first time is about five times slower than the third time on my linux box.

Allocating memories for the first time is 100 times slower then the second time.

I tried pre-allocation and used mlockall in my application, but even so, the first execution of one function is about 160 times slower than the second one which is about twice slower than the third one.

The pseudo code of the function is as below. msg is allocated on heap. But it's not included in the time measurement. msg2 is POD, so there is no memory allocation at all in slow_for_the_first_time.

void slow_for_the_first_time(Message * msg) {
     Msg2 msg2;
     //set msg2 using msg
  .... }

Just wondering, what could cause the slowness of the first time execution? And are there ways to avoid it?

erenon's answer helps a lot. I think it's probably because Msg2 is defined in a so library.

before using LD_BIND_NOW=1, the first execution time is around 8000 nanosecond, the second one is around 500 nanoseconds, and the third one is around 200 nanoseconds.

now the first execution time is around 2000 nanoseconds while the second one and the third one remains unchanged. so it's still 10 times slower than the third execution, there should be other factors that affect the first execution time.

some interesting findings.

calling method below before slow_for_the_first_time can reduce another 1 microsecond for the first execution time

void dummySet(Msg2& msg2)
{
    //set all fields of msg2. msg2 has about 30 fields it won't work if only set one field of msg2.
}

another thing worth mentioning is that the slowness of the first execution is definitely not related to msg, as the second slow_for_the_first_time in code below

char buffer[sizeof(Message)];
memset(buffer, 0, sizeof(buffer));
slow_for_the_first_time((Message*)buffer);//calling the method with a dummy buffer.
.....
slow_for_the_first_time(msg);//calling the method for the second time with a real msg.

is as fast as the second slow_for_the_first_time in code below

slow_for_the_first_time(msg);//the first time takes around 2000 nanoseconds
.....
slow_for_the_first_time(msg);//the second time takes around 500 nanoseconds.

Upvotes: 0

Views: 1141

Answers (2)

YSC
YSC

Reputation: 40060

In addition to the lazy linking erenon talks about in their answer, there is two other factors for slow execution at first run: cold cache and cold branch prediction.

As a whole, the speedup for the subsequently calls come from:

  • the external symbols: once a symbol has been resolved by the linker, it is for the lifetime of the program and is practically a no-op after that;
  • data: when data is crunched by the CPU, it is temporarily stored inside the CPU cache. Loading memory into that cache is a costly operation. But once it's in there, since the cache is a blazing fast memory really close to the CPU, the same data is quickly available for the next time. You can read this other answer about cache.
  • the CPU: branch prediction significantly improves code execution by trying and predict the way the code branches. This needs warm up as well. Here is an excellent answer about branch prediction.

As a whole, code tend to be slow when first executed. If this is a problem, the solutions are:

  • LD_BIND_NOW, to link at startup;
  • cache warmup;
  • branch prediction warmup.

Upvotes: 2

erenon
erenon

Reputation: 19118

Dynamically linked symbols need to be looked up in the set of dynamically loaded symbols the first time you reference them. To see if this really is the issue, do:

$ LD_BIND_NOW=1 ./your_program

LD_BIND_NOW will instruct the linker to fix the address if every entry in GOT and PLT: this will make the startup slightly slower, but also possibly resolving the "first call is slow" issue in exchange.

If this proves to be the problem, you can try statically linking libraries or prelinking.

Upvotes: 2

Related Questions