How the binary code space alignment influence the runtime efficiency?

Question

I was optimizing some C language codes and realized a weird thing.

I noticed a piece of useless code (not being used by any parts) and deleted it. But the performance went down by around 4%, which is not influenced by CPU because I already tested it multiple times and during different times in several days.

Besides that, I also found out that when I compile this project using Makefile, which takes all .c files as inputs and generates an output binary, the performance is good. But when I divided the code into different modules and used CMake to compile it separately, and combined them together at the end, the run time performance was totally different from the Makefile version (worse than the Makefile version). By the way, the platform, the compile option (always including -O3), and the GCC version remain the same for sure.

I checked the binary code using reverse engineering tools and found that the compiled codes are almost the same, but the function sequence and virtual address space of course are different.

So I am confused about that. I guess this is caused by the code space arrangement, like the code space size, code address alignment, etc.

And fortunately, I found an answer: Why does GCC generate 15-20% faster code if I optimize for size instead of speed?. So the alignment truly influenced the performance.

But are there any methods or tools to find the best way to align the code? Because I tried using -falign-functions=32 -falign-loops=32 but it is not always working. Sometimes it makes the performance even worse. And so does the option -flto.

So, I am wondering if we can truly know in which situations we need to tell the compiler using alignment while we don't in others. Are there any tools that could be helpful to analyze this problem?

How the binary code space alignment influence the runtime efficiency?

Answers (0)

Related Questions