Reputation: 333

How much speed gain if using INLINE?

In my understanding, INLINE can speed up code execution, is it?

How much speed can we gain from it?

Upvotes: 0

Answers (3)

Nominal Animal

Reputation: 39426

The reason I use inline function specifier (specifically, static inline), is not because of "speed", but because

static part tells the compiler the function is only visible in the current translation unit (the current file being compiled and included header files)
inline part tells the compiler it can include the implementation of the function at the call site, if it wants to
static inline tells the compiler that it can skip the function completely if it is not used at all in the current translation unit

(Specifically, the compiler that I use most with the options I use most, gcc -Wall, does issue a warning if a function marked static is unused; but will not issue a warning if a function marked static inline is unused.)
static inline tells us humans that the function is a macro-like helper function, in addition adding type-checker to the same behavior as macro's.

Thus, in my opinion, the assumption that inline has anything to do with speed per se, is incorrect. Answering the stated question with a straight answer would be misleading.

In my code, you see them associated with some data structures, or occasionally global variables.

A typical example is when I want to implement a Xorshift pseudorandom number generator in my own C code:

#include <inttypes.h>

static uint64_t  prng_state = 1; /* Any nonzero uint64_t seed is okay */

static inline uint64_t  prng_u64(void)
{
    uint64_t  state;

    state = prng_state;
    state ^= state >> 12;
    state ^= state << 25;
    state ^= state >> 27;
    prng_state = state;

    return state * UINT64_C(2685821657736338717);
}

The static uint64_t prng_state = 1; means that prng_state is a variable of type uint64_t, visible only in the current compilation unit, and initialized to 1. The prng_u64() function returns an unsigned 64-bit pseudorandom integer. However, if you do not use prng_u64(), the compiler will not generate code for it either.

Another typical use case is when I have data structures, and they need accessor functions. For example,

#ifndef   GRID_H
#define   GRID_H
#include <stdlib.h>

typedef struct {
    int            rows;
    int            cols;
    unsigned char *cell;
} grid;
#define  GRID_INIT { 0, 0, NULL }

#define  GRID_OUTSIDE -1

static inline int grid_get(grid *const g, const int row, const int col)
{
    if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
        return GRID_OUTSIDE;
    return g->cell[row * (size_t)(g->cols) + col];
}

static inline int grid_set(grid *const g, const int row, const int col,
                           const unsigned char value)
{
    if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
        return GRID_OUTSIDE;
    return g->cell[row * (size_t)(g->cols) + col] = value;
}

static inline void grid_init(grid *g)
{
    g->rows = 0;
    g->cols = 0;
    g->cell = NULL;
}

static inline void grid_free(grid *g)
{
    free(g->cell);
    g->rows = 0;
    g->cols = 0;
    g->cell = NULL;
}

int grid_create(grid *g, const int rows, const int cols,
                const unsigned char initial_value);

int grid_load(grid *g, FILE *handle);

int grid_save(grid *g, FILE *handle);

#endif /* GRID_H */

That header file defines some useful helper functions, and declares the functions grid_create(), grid_load(), and grid_save(), that would be implemented in a separate .c file.

(Yes, those three functions could be implemented in the header file just as well, but it would make the header file quite large. If you had a large project, spread over many translation units (.c source files), each one including the header file would get their own local copies of the functions. The accessor functions defined as static inline above are short and trivial, so it is perfectly okay for them to be copied here and there. The three functions I omitted are much larger.)

Upvotes: 1

Andrejs Cainikovs

Reputation: 28474

Ripped from here:

Yes and no. Sometimes. Maybe.

There are no simple answers. inline functions might make the code faster, they might make it slower. They might make the executable larger, they might make it smaller. They might cause thrashing, they might prevent thrashing. And they might be, and often are, totally irrelevant to speed.

inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster.

inline functions might make it slower: Too much inlining might cause code bloat, which might cause "thrashing" on demand-paged virtual-memory systems. In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.

inline functions might make it larger: This is the notion of code bloat, as described above. For example, if a system has 100 inline functions each of which expands to 100 bytes of executable code and is called in 100 places, that's an increase of 1MB. Is that 1MB going to cause problems? Who knows, but it is possible that that last 1MB could cause the system to "thrash," and that could slow things down.

inline functions might make it smaller: The compiler often generates more code to push/pop registers/parameters than it would by inline-expanding the function's body. This happens with very small functions, and it also happens with large functions when the optimizer is able to remove a lot of redundant code through procedural integration — that is, when the optimizer is able to make the large function small.

inline functions might cause thrashing: Inlining might increase the size of the binary executable, and that might cause thrashing.

inline functions might prevent thrashing: The working set size (number of pages that need to be in memory at once) might go down even if the executable size goes up. When f() calls g(), the code is often on two distinct pages; when the compiler procedurally integrates the code of g() into f(), the code is often on the same page.

inline functions might increase the number of cache misses: Inlining might cause an inner loop to span across multiple lines of the memory cache, and that might cause thrashing of the memory-cache.

inline functions might decrease the number of cache misses: Inlining usually improves locality of reference within the binary code, which might decrease the number of cache lines needed to store the code of an inner loop. This ultimately could cause a CPU-bound application to run faster.

inline functions might be irrelevant to speed: Most systems are not CPU-bound. Most systems are I/O-bound, database-bound or network-bound, meaning the bottleneck in the system's overall performance is the file system, the database or the network. Unless your "CPU meter" is pegged at 100%, inline functions probably won't make your system faster. (Even in CPU-bound systems, inline will help only when used within the bottleneck itself, and the bottleneck is typically in only a small percentage of the code.)

There are no simple answers: You have to play with it to see what is best. Do not settle for simplistic answers like, "Never use inline functions" or "Always use inline functions" or "Use inline functions if and only if the function is less than N lines of code." These one-size-fits-all rules may be easy to write down, but they will produce sub-optimal results.

Upvotes: 6

alinsoar

Reputation: 15813

Using inline makes the system use the substitution model of evaluation, but this is not guaranteed to be used all the time. If this is used, the generated code will be longer and may be faster, but if some optimizations are active, the sustitution model is not faster not all the time.

Upvotes: 2

How much speed gain if using __INLINE__?

Answers (3)

Related Questions

How much speed gain if using INLINE?