Mys_721tx
Mys_721tx

Reputation: 423

What kinds of C optimization practices in x86 that were historically recommended to use are no longer effective?

Due to the advances in x86 C compilers (namely GCC and Clang), many coding practices that were believed to improve efficiency are no longer used since the compilers can do a better job optimizing the code than humans (e.g. bit shift vs. multiplication).

Which specific practices are these?

Upvotes: 29

Views: 1402

Answers (4)

Sean Werkema
Sean Werkema

Reputation: 6145

One optimization that really shouldn't be used much more is #define (expanding on duskwuff's answer a bit).

The C preprocessor is a wonderful thing, and it can do some amazing code transformations, and it can make certain really complex code much simpler — but using #define just to cause a small operation to be inlined isn't usually appropriate anymore. Most modern compilers have a real inline keyword (or equivalent, like __inline__), and they're smart enough to inline most static functions anyway, which means that code like this:

#define sum(x, y) ((x) + (y))

is really better written as the equivalent function:

static int sum(int x, int y)
{
    return x + y;
}

You avoid dangerous multiple-evaluation problems and side-effects, you get compiler type-checking and you end up with cleaner code, too. If it's worth inlining, the compiler will do it.

In general, save the preprocessor for the circumstances where it's needed: Emitting a lot of complex, variant code or partial code quickly. Using the preprocessor for inlining small functions and defining constants is mostly an antipattern now.

Upvotes: 0

One such practice is to avoid multiplications by using arrays of array pointers instead of real 2D arrays.


Old practice:

int width = 1234, height = 5678;
int* buffer = malloc(width*height*sizeof(*buffer));
int** image = malloc(height*sizeof(*image));
for(int i = height; i--; ) image[i] = &buffer[i*width];

//Now do some heavy computations with image[y][x].

This used to be faster, because multiplications used to be very expensive (on the order of 30 CPU cycles), while memory accesses were virtually free (it was only in the 1990s that caches were added because memory couldn't keep up with full CPU speed).


But multiplications became fast, some CPUs being able to do them in one CPU cycle, while memory accesses did not keep pace at all. So, now this code is likely to be more performant:

int width = 1234, height = 5678;
int (*image)[width] = malloc(height*sizeof(*image));

//Now do some heavy computations with image[y][x],
//which will invoke pointer arithmetic to calculate the offset as (y*width + x)*sizeof(int).

Currently, there are still some CPUs around, where the second code is not faster, but the big multiplication penalty is not with us anymore.

Upvotes: 8

jdehaan
jdehaan

Reputation: 19938

Due to the plurality of platforms you would at best optimize for a given platform (or CPU Architecture/Model) and compiler!! If your code is running on many platforms it's a waste of time. (I'm talking about micro opts, it's always worth considering better algorithms)

This said optimizing for a given platform, DSP makes sense if the need for it arises. Then the best first helper is IMHO the judicious use of restrict keyword if the compiler/optimizer supports it well. Avoid algorithms involving conditions and jumpy code (breaks, goto, if, while, ...) This favors streaming and avoids too many bad branch predictions. I would agree these hints are common sense now.

Generally speaking I would say: Any manipulation that modifies the code by making assumptions on how the compiler optimizes shall be IMHO avoided at all.

Rather, then switch to assembly (common practice for some really important algorithms in DSPs where the compilers while being really great still miss the last few % of CPU/Mem cycles performance increase...)

Upvotes: 4

user149341
user149341

Reputation:

Of the optimizations which are commonly recommended, a couple which are basically never fruitful given modern compilers include:

Mathematical transformations

Modern compilers understand mathematics, and will perform transformations on mathematical expressions where appropriate.

Optimizations such as conversion of multiplication to addition, or constant multiplication or division to bit shifting, are already performed by modern compilers, even at low optimization levels. Examples of these optimizations include:

x * 2  ->  x + x
x * 2  ->  x << 1

Note that some specific cases may differ. For instance, x >> 1 is not the same as x / 2; it is not appropriate to substitute one for the other!

Additionally, many of these suggested optimizations aren't actually any faster than the code they replace.

Stupid code tricks

I'm not even sure what to call this, but tricks like XOR swapping (a ^= b; b ^= a; a ^= b;) are not optimizations at all. They're just party tricks — they are slower, and more fragile, than the obvious approach. Don't use them.

The register keyword

This keyword is ignored by many modern compilers, as it its intended meaning (forcing a variable to be stored in a register) is not meaningful given current register allocation algorithms.

Code transformations

Compilers will automatically perform a wide variety of code transformations where appropriate. Several such transformations which are frequently recommended for manual application, but which are rarely useful when applied thus, include:

  • Loop unrolling. (This is often actually harmful when applied indiscriminately, as it bloats code size.)
  • Function inlining. (Tag a function as static, and it'll usually be inlined where appropriate when optimization is enabled.)

Upvotes: 18

Related Questions