fouric
fouric

Reputation: 1758

(How) Can I inline a particular function call?

Let's say that I have a function that gets called in multiple parts of a program. Let's also say that I have a particular call to that function that is in an extremely performance-sensitive section of code (e.g., a loop that iterates tens of millions of times and where each microsecond counts). Is there a way that I can force the complier (gcc in my case) to inline that single, particular function call, without inlining the others?

EDIT: Let me make this completely clear: this question is NOT about forcing gcc (or any other compiler) to inline all calls to a function; rather, it it about requesting that the compiler inline a particular call to a function.

Upvotes: 43

Views: 8959

Answers (8)

blakeh
blakeh

Reputation: 101

If you do not mind having two names for the same function, you could create a small wrapper around your function to "block" the always_inline attribute from affecting every call. In my example, loop_inlined would be the name you would use in performance-critical sections, while the plain loop would be used everywhere else.

inline.h

#include <stdlib.h>

static inline int loop_inlined() __attribute__((always_inline));
int loop();

static inline int loop_inlined() {
    int n = 0, i;
    for(i = 0; i < 10000; i++) 
        n += rand();
    return n;
}

inline.c

#include "inline.h"

int loop() {
    return loop_inlined();
}

main.c

#include "inline.h"
#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("%d\n", loop_inlined());
    printf("%d\n", loop());
    return 0;
}

This works regardless of the optimization level. Compiling with gcc inline.c main.c on Intel gives:

4011e6:       c7 44 24 18 00 00 00    movl   $0x0,0x18(%esp)
4011ed:       00
4011ee:       eb 0e                   jmp    4011fe <_main+0x2e>
4011f0:       e8 5b 00 00 00          call   401250 <_rand>
4011f5:       01 44 24 1c             add    %eax,0x1c(%esp)
4011f9:       83 44 24 18 01          addl   $0x1,0x18(%esp)
4011fe:       81 7c 24 18 0f 27 00    cmpl   $0x270f,0x18(%esp)
401205:       00
401206:       7e e8                   jle    4011f0 <_main+0x20>
401208:       8b 44 24 1c             mov    0x1c(%esp),%eax
40120c:       89 44 24 04             mov    %eax,0x4(%esp)
401210:       c7 04 24 60 30 40 00    movl   $0x403060,(%esp)
401217:       e8 2c 00 00 00          call   401248 <_printf>
40121c:       e8 7f ff ff ff          call   4011a0 <_loop>
401221:       89 44 24 04             mov    %eax,0x4(%esp)
401225:       c7 04 24 60 30 40 00    movl   $0x403060,(%esp)
40122c:       e8 17 00 00 00          call   401248 <_printf>

The first 7 instructions are the inlined call, and the regular call happens 5 instructions later.

Upvotes: 3

Enzo Ferber
Enzo Ferber

Reputation: 3094

There's a kernel source that uses #defines in a very interesting way to define several different named functions with the same body. This solves the problem of having two different functions to maintain. (I forgot which one it was...). My idea is based on this same principle.

The way to use the defines is that you'll define the inline function on the compilation unit you need it. To demonstrate the method I'll use a simple function:

int add(int a, int b);

It works like this: you make a function generator #define in a header file and declare the function prototype of the normal version of the function (the one not inlined).

Then you declare two separate function generators, one for the normal function and one for the inline function. The inline function you declare as static __inline__. When you need to call the inline function in one of your files, you use the generator define to get the source for it. In all other files you need to use the normal function, you just include the header with the prototype.

The code was tested on:

Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
Kernel Version: 3.16.0-49-generic
GCC 4.8.4

Code is worth more than a thousand words, so:

File Hierarchy

+
| Makefile
| add.h
| add.c
| loop.c
| loop2.c
| loop3.c
| loops.h
| main.c

add.h

#define GENERATE_ADD(type, prefix)  \
    type int prefix##add(int a, int b) { return a + b; }

#define DEFINE_ADD()            GENERATE_ADD(,)
#define DEFINE_INLINE_ADD()     GENERATE_ADD(static __inline__, inline_)

int add(int, int);

This doesn't look nice, but cuts the work of maintaining two different functions. The function is fully defined within the GENERATE_ADD(type,prefix) macro, so if you ever need to change the function, you change this macro and everything else changes.

Next, DEFINE_ADD() will be called from add.c to generate the normal version of add. DEFINE_INLINE_ADD() will give you access to a function called inline_add, which has the same signature as your normal addfunction, but it has a different name (the inline_ prefix).

Note: I didn't use the __attribute((always_inline))__ when using the -O3 flag - the __inline__ did the job. However, if you don't wanna use -O3, use:

#define DEFINE_INLINE_ADD()     GENERATE_ADD(static __inline__ __attribute__((always_inline)), inline_)

add.c

#include "add.h"

DEFINE_ADD()

Simple call to the DEFINE_ADD() macro generator. This will declare the normal version of the function (the one that won't get inlined).

loop.c

#include <stdio.h>
#include "add.h"

DEFINE_INLINE_ADD()

int loop(void)
{

    register int i;

    for (i = 0; i < 100000; i++)
        printf("%d\n", inline_add(i + 1, i + 2));

    return 0;
}

Here in loop.c you can see the call to DEFINE_INLINE_ADD(). This gives this function access to the inline_add function. When you compile, all inline_add function will be inlined.

loop2.c

#include <stdio.h>
#include "add.h"

int loop2(void)
{
    register int i;

    for (i = 0; i < 100000; i++)
        printf("%d\n", add(i + 1, i + 2));

    return 0;
}

This is to show you can use the normal version of add normally from other files.

loop3.c

#include <stdio.h>
#include "add.h"

DEFINE_INLINE_ADD()

int loop3(void)
{

    register int i;

    printf ("add: %d\n", add(2,3));
    printf ("add: %d\n", add(4,5));
    for (i = 0; i < 100000; i++)
        printf("%d\n", inline_add(i + 1, i + 2));

    return 0;
}

This is to show that you can use both the functions in the same compilation unit, yet one of the functions will be inlined, and the other wont (see GDB disass bellow for details).

loops.h

/* prototypes for main */
int loop (void);
int loop2 (void);
int loop3 (void);

main.c

#include <stdio.h>
#include <stdlib.h>
#include "add.h"
#include "loops.h"

int main(void)
{
    printf("%d\n", add(1,2));
    printf("%d\n", add(2,3));

    loop();
    loop2();
    loop3();
    return 0;
}

Makefile

CC=gcc
CFLAGS=-Wall -pedantic --std=c11

main: add.o loop.o loop2.o loop3.o main.o
    ${CC} -o $@ $^ ${CFLAGS}

add.o: add.c 
    ${CC} -c $^ ${CFLAGS}

loop.o: loop.c
    ${CC} -c $^ -O3 ${CFLAGS}

loop2.o: loop2.c 
    ${CC} -c $^ ${CFLAGS}

loop3.o: loop3.c
    ${CC} -c $^ -O3 ${CFLAGS}

If you use the __attribute__((always_inline)) you can change the Makefile to:

CC=gcc
CFLAGS=-Wall -pedantic --std=c11

main: add.o loop.o loop2.o loop3.o main.o
    ${CC} -o $@ $^ ${CFLAGS}

%.o: %.c
    ${CC} -c $^ ${CFLAGS}

Compilation

$ make
gcc -c add.c -Wall -pedantic --std=c11
gcc -c loop.c -O3 -Wall -pedantic --std=c11
gcc -c loop2.c -Wall -pedantic --std=c11
gcc -c loop3.c -O3 -Wall -pedantic --std=c11
gcc -Wall -pedantic --std=c11   -c -o main.o main.c
gcc -o main add.o loop.o loop2.o loop3.o main.o -Wall -pedantic --std=c11

Disassembly

$ gdb main
(gdb) disass add

   0x000000000040059d <+0>: push   %rbp
   0x000000000040059e <+1>: mov    %rsp,%rbp
   0x00000000004005a1 <+4>: mov    %edi,-0x4(%rbp)
   0x00000000004005a4 <+7>: mov    %esi,-0x8(%rbp)
   0x00000000004005a7 <+10>:mov    -0x8(%rbp),%eax
   0x00000000004005aa <+13>:mov    -0x4(%rbp),%edx
   0x00000000004005ad <+16>:add    %edx,%eax
   0x00000000004005af <+18>:pop    %rbp
   0x00000000004005b0 <+19>:retq   

(gdb) disass loop

   0x00000000004005c0 <+0>: push   %rbx
   0x00000000004005c1 <+1>: mov    $0x3,%ebx
   0x00000000004005c6 <+6>: nopw   %cs:0x0(%rax,%rax,1)
   0x00000000004005d0 <+16>:mov    %ebx,%edx
   0x00000000004005d2 <+18>:xor    %eax,%eax
   0x00000000004005d4 <+20>:mov    $0x40079d,%esi
   0x00000000004005d9 <+25>:mov    $0x1,%edi
   0x00000000004005de <+30>:add    $0x2,%ebx
   0x00000000004005e1 <+33>:callq  0x4004a0 <__printf_chk@plt>
   0x00000000004005e6 <+38>:cmp    $0x30d43,%ebx
   0x00000000004005ec <+44>:jne    0x4005d0 <loop+16>
   0x00000000004005ee <+46>:xor    %eax,%eax
   0x00000000004005f0 <+48>:pop    %rbx
   0x00000000004005f1 <+49>:retq   

(gdb) disass loop2

   0x00000000004005f2 <+0>: push   %rbp
   0x00000000004005f3 <+1>: mov    %rsp,%rbp
   0x00000000004005f6 <+4>: push   %rbx
   0x00000000004005f7 <+5>: sub    $0x8,%rsp
   0x00000000004005fb <+9>: mov    $0x0,%ebx
   0x0000000000400600 <+14>:jmp    0x400625 <loop2+51>
   0x0000000000400602 <+16>:lea    0x2(%rbx),%edx
   0x0000000000400605 <+19>:lea    0x1(%rbx),%eax
   0x0000000000400608 <+22>:mov    %edx,%esi
   0x000000000040060a <+24>:mov    %eax,%edi
   0x000000000040060c <+26>:callq  0x40059d <add>
   0x0000000000400611 <+31>:mov    %eax,%esi
   0x0000000000400613 <+33>:mov    $0x400794,%edi
   0x0000000000400618 <+38>:mov    $0x0,%eax
   0x000000000040061d <+43>:callq  0x400470 <printf@plt>
   0x0000000000400622 <+48>:add    $0x1,%ebx
   0x0000000000400625 <+51>:cmp    $0x1869f,%ebx
   0x000000000040062b <+57>:jle    0x400602 <loop2+16>
   0x000000000040062d <+59>:mov    $0x0,%eax
   0x0000000000400632 <+64>:add    $0x8,%rsp
   0x0000000000400636 <+68>:pop    %rbx
   0x0000000000400637 <+69>:pop    %rbp
   0x0000000000400638 <+70>:retq   

(gdb) disass loop3

   0x0000000000400640 <+0>: push   %rbx
   0x0000000000400641 <+1>: mov    $0x3,%esi
   0x0000000000400646 <+6>: mov    $0x2,%edi
   0x000000000040064b <+11>:mov    $0x3,%ebx
   0x0000000000400650 <+16>:callq  0x40059d <add>
   0x0000000000400655 <+21>:mov    $0x400798,%esi
   0x000000000040065a <+26>:mov    %eax,%edx
   0x000000000040065c <+28>:mov    $0x1,%edi
   0x0000000000400661 <+33>:xor    %eax,%eax
   0x0000000000400663 <+35>:callq  0x4004a0 <__printf_chk@plt>
   0x0000000000400668 <+40>:mov    $0x5,%esi
   0x000000000040066d <+45>:mov    $0x4,%edi
   0x0000000000400672 <+50>:callq  0x40059d <add>
   0x0000000000400677 <+55>:mov    $0x400798,%esi
   0x000000000040067c <+60>:mov    %eax,%edx
   0x000000000040067e <+62>:mov    $0x1,%edi
   0x0000000000400683 <+67>:xor    %eax,%eax
   0x0000000000400685 <+69>:callq  0x4004a0 <__printf_chk@plt>
   0x000000000040068a <+74>:nopw   0x0(%rax,%rax,1)
   0x0000000000400690 <+80>:mov    %ebx,%edx
   0x0000000000400692 <+82>:xor    %eax,%eax
   0x0000000000400694 <+84>:mov    $0x40079d,%esi
   0x0000000000400699 <+89>:mov    $0x1,%edi
   0x000000000040069e <+94>:add    $0x2,%ebx
   0x00000000004006a1 <+97>:callq  0x4004a0 <__printf_chk@plt>
   0x00000000004006a6 <+102>:cmp    $0x30d43,%ebx
   0x00000000004006ac <+108>:jne    0x400690 <loop3+80>
   0x00000000004006ae <+110>:xor    %eax,%eax
   0x00000000004006b0 <+112>:pop    %rbx
   0x00000000004006b1 <+113>:retq   

Symbol table

$ objdump -t main | grep add
0000000000000000 l    df *ABS*  0000000000000000              add.c
000000000040059d g     F .text  0000000000000014              add

$ objdump -t main | grep loop
0000000000000000 l    df *ABS*  0000000000000000              loop.c
0000000000000000 l    df *ABS*  0000000000000000              loop2.c
0000000000000000 l    df *ABS*  0000000000000000              loop3.c
00000000004005c0 g     F .text  0000000000000032              loop
00000000004005f2 g     F .text  0000000000000047              loop2
0000000000400640 g     F .text  0000000000000072              loop3

$ objdump -t main | grep main
main:     file format elf64-x86-64
0000000000000000 l    df *ABS*  0000000000000000              main.c
0000000000000000       F *UND*  0000000000000000              __libc_start_main@@GLIBC_2.2.5
00000000004006b2 g     F .text  000000000000005a              main

$ objdump -t main | grep inline
$

Well, that's it. After 3 hours of banging my head in the keyboard trying to figure it out, this was the best I could come up with. Feel free to point any errors, I'll really appreciate it. I got really interested in this particular inline one function call.

Upvotes: 3

Ant
Ant

Reputation: 1708

Here's a suggestion, write the body of the code in a separate header file. Include the header file in place where it has to be inline and into a body in a C file for other calls.

void demo(void)
{
#include myBody.h
}

importantloop
{
    // code
#include myBody.h
    // code
}

Upvotes: 1

andygavin
andygavin

Reputation: 2884

The answer is it depends on your function, what you request and the nature of your function. Your best bet is to:

  • tell the compiler you want it inlined
  • make the function static (be careful with extern as it's semantics change a little in gcc in some modes)
  • set the compiler options to inform the optimizer you want inlining, and set inline limits appropriately
  • turn on any couldn't inline warnings on the compiler
  • verify the output (you could check the assembler generated) that the function is in-lined.

Compiler hints

The answers here cover just one side of inlining, the language hints to the compiler. When the standard says:

Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined

This can be the case for other stronger hints such as:

  • GNU's __attribute__((always_inline)): Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified.
  • Microsoft's __forceinline: The __forceinline keyword overrides the cost/benefit analysis and relies on the judgment of the programmer instead. Exercise caution when using __forceinline. Indiscriminate use of __forceinline can result in larger code with only marginal performance gains or, in some cases, even performance losses (due to increased paging of a larger executable, for example).

Even both of these would rely on the inlining being possible, and crucially on compiler flags. To work with inlined functions you also need to understand the optimisation settings of your compiler.

It may be worth saying inlining can also be used to provide replacements for existing functions just for the compilation unit you are in. This can be used when an approximate answers are good enough for your algorithm, or a result can be achieved in a faster way with local data-structures.

An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.

Some functions cannot be inlined

For example, for the GNU compiler functions that cannot be inlined are:

Note that certain usages in a function definition can make it unsuitable for inline substitution. Among these usages are: variadic functions, use of alloca, use of variable-length data types (see Variable Length), use of computed goto (see Labels as Values), use of nonlocal goto, and nested functions (see Nested Functions). Using -Winline warns when a function marked inline could not be substituted, and gives the reason for the failure.

So even always_inline may not do what you expect.

Compiler Options

Using C99's inline hints will rely on you instructing the compiler the inline behavour you are looking for.

GCC for instance has:

-fno-inline, -finline-small-functions, -findirect-inlining, -finline-functions, -finline-functions-called-once, -fearly-inlining, -finline-limit=n

Microsoft compiler also has options that dictate the effectiveness of inline. Some compilers will also allow optimization to take into account running profile.

I do think it's worth seeing inlining in the broader context of program optimization.

Preventing Inlining

You mention that you don't want certain functions inlined. This might be done by setting something like __attribute__((always_inline)) without turning on the optimizer. However you would probably would want the optimizer. One option here would be to hint you don't want it: __attribute__ ((noinline)). But why would this be the case?

Other forms of optimization

You may also consider how you might restructure your loop and avoiding branches. Branch prediction can have a dramatic effect. For an interesting discussion on this see: Why is it faster to process a sorted array than an unsorted array?

Then you also might smaller inner loops to be unrolled and to look at invariants.

Upvotes: 4

gavv
gavv

Reputation: 4893

It is possible to enable inlining per translation unit (but not per call). Though this is not an answer for the question and is an ugly trick, it conforms to C standard and may be interesting as related stuff.

The trick is to use extern definition where you do not want to inline, and extern inline where you need inlining.

Example:

$ cat func.h 
int func();

$ cat func.c 
int func() { return 10; }

$ cat func_inline.h 
extern inline int func() { return 5; }

$ cat main.c       
#include <stdio.h>

#ifdef USE_INLINE
# include "func_inline.h"
#else
# include "func.h"
#endif

int main() { printf("%d\n", func()); return 0; }

$ gcc main.c func.c && ./a.out
10                                                // non-inlined version

$ gcc main.c func.c -DUSE_INLINE && ./a.out
10                                                // non-inlined version

$ gcc main.c func.c -DUSE_INLINE -O2 && ./a.out
5                                                 // inlined!

You can also use non-standard attribute (e.g. __attribute__(always_inline)) in GCC) for extern inline definition, instead of relying on -O2.

BTW, the trick is used in glibc.

Upvotes: 18

user5159806
user5159806

Reputation: 92

I assume that your function is a little one since you want to inline it, if so why don't you write it in asm?

As for inlining only a specific call to a function I don't think there exists something to do this task for you. Once a function is declared as inline and if the compiler will inline it for you it will do it everywhere it sees a call to that function.

Upvotes: -3

abhoriel
abhoriel

Reputation: 103

the traditional way to force inline a function in C was to not use a function at all, but to use a function like macro. This method will always inline the function, but there are some problems with function like macros. For example:

#define ADD(x, y) ((x) + (y))
printf("%d\n", ADD(2, 2));

There is also the inline keyword, which was added to C in the C99 standard. Notably, Microsoft's Visual C compiler doesn't support C99, and thus you can't use inline with that (miserable) compiler. Inline only hints to the compiler that you want the function inlined - it does not guarantee it.

GCC has an extension which requires the compiler to inline the function.

inline __attribute__((always_inline)) int add(int x, int y) {
    return x + y;
}

To make this cleaner, you may want want to use a macro:

#define ALWAYS_INLINE inline __attribute__((always_inline))
ALWAYS_INLINE int add(int x, int y) {
    return x + y;
}

I don't know of a direct way of having a function that can be force inlined on certain calls. But you can combine the techniques like this:

#define ALWAYS_INLINE inline __attribute__((always_inline))
#define ADD(x, y) ((x) + (y))
ALWAYS_INLINE int always_inline_add(int x, int y) {
    return ADD(x, y);
}

int normal_add(int x, int y) {
    return ADD(x, y);
}

Or, you could just have this:

#define ADD(x, y) ((x) + (y))
int add(int x, int y) {
    return ADD(x, y);
}

int main() {
    printf("%d\n", ADD(2,2));    // always inline
    printf("%d\n", add(2,2));    // normal function call
    return 0;
}

Also, note that forcing the inline of a function might not make your code faster. Inline functions cause larger code to be generated, which might cause more cache misses to occur. I hope that helps.

Upvotes: 6

James Curran
James Curran

Reputation: 103515

In C (as opposed to C++) there's no standard way to suggest that a function should be inlined. It's only vender-specific extensions.

However you specify it, as far as I know the compiler will always try to inline every instance, so use that function only once:

original:

   int MyFunc()  { /* do stuff */  }

change to:

   inline int MyFunc_inlined()  { /* do stuff */  }

   int MyFunc()  { return MyFunc_inlined(); }

Now, in theplaces where you want it inlined, use MyFunc_inlined()

Note: "inline" keyword in the above is just a placeholder for whatever syntax gcc uses to force an inlining. If H2CO3's deleted answer is to be trusted, that would be:

static inline __attribute__((always_inline)) int MyFunc_inlined()  { /* do stuff */  }

Upvotes: 17

Related Questions