Reputation: 1758
Let's say that I have a function that gets called in multiple parts of a program. Let's also say that I have a particular call to that function that is in an extremely performance-sensitive section of code (e.g., a loop that iterates tens of millions of times and where each microsecond counts). Is there a way that I can force the complier (gcc
in my case) to inline that single, particular function call, without inlining the others?
EDIT: Let me make this completely clear: this question is NOT about forcing gcc (or any other compiler) to inline all calls to a function; rather, it it about requesting that the compiler inline a particular call to a function.
Upvotes: 43
Views: 8959
Reputation: 101
If you do not mind having two names for the same function, you could create a small wrapper around your function to "block" the always_inline attribute from affecting every call. In my example, loop_inlined
would be the name you would use in performance-critical sections, while the plain loop
would be used everywhere else.
#include <stdlib.h>
static inline int loop_inlined() __attribute__((always_inline));
int loop();
static inline int loop_inlined() {
int n = 0, i;
for(i = 0; i < 10000; i++)
n += rand();
return n;
}
#include "inline.h"
int loop() {
return loop_inlined();
}
#include "inline.h"
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%d\n", loop_inlined());
printf("%d\n", loop());
return 0;
}
This works regardless of the optimization level. Compiling with gcc inline.c main.c
on Intel gives:
4011e6: c7 44 24 18 00 00 00 movl $0x0,0x18(%esp)
4011ed: 00
4011ee: eb 0e jmp 4011fe <_main+0x2e>
4011f0: e8 5b 00 00 00 call 401250 <_rand>
4011f5: 01 44 24 1c add %eax,0x1c(%esp)
4011f9: 83 44 24 18 01 addl $0x1,0x18(%esp)
4011fe: 81 7c 24 18 0f 27 00 cmpl $0x270f,0x18(%esp)
401205: 00
401206: 7e e8 jle 4011f0 <_main+0x20>
401208: 8b 44 24 1c mov 0x1c(%esp),%eax
40120c: 89 44 24 04 mov %eax,0x4(%esp)
401210: c7 04 24 60 30 40 00 movl $0x403060,(%esp)
401217: e8 2c 00 00 00 call 401248 <_printf>
40121c: e8 7f ff ff ff call 4011a0 <_loop>
401221: 89 44 24 04 mov %eax,0x4(%esp)
401225: c7 04 24 60 30 40 00 movl $0x403060,(%esp)
40122c: e8 17 00 00 00 call 401248 <_printf>
The first 7 instructions are the inlined call, and the regular call happens 5 instructions later.
Upvotes: 3
Reputation: 3094
There's a kernel source that uses #define
s in a very interesting way to define several different named functions with the same body. This solves the problem of having two different functions to maintain. (I forgot which one it was...). My idea is based on this same principle.
The way to use the defines is that you'll define the inline function on the compilation unit you need it. To demonstrate the method I'll use a simple function:
int add(int a, int b);
It works like this: you make a function generator #define
in a header file and declare the function prototype of the normal version of the function (the one not inlined).
Then you declare two separate function generators, one for the normal function and one for the inline function. The inline function you declare as static __inline__
. When you need to call the inline function in one of your files, you use the generator define to get the source for it. In all other files you need to use the normal function, you just include the header with the prototype.
The code was tested on:
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
Kernel Version: 3.16.0-49-generic
GCC 4.8.4
Code is worth more than a thousand words, so:
+
| Makefile
| add.h
| add.c
| loop.c
| loop2.c
| loop3.c
| loops.h
| main.c
#define GENERATE_ADD(type, prefix) \
type int prefix##add(int a, int b) { return a + b; }
#define DEFINE_ADD() GENERATE_ADD(,)
#define DEFINE_INLINE_ADD() GENERATE_ADD(static __inline__, inline_)
int add(int, int);
This doesn't look nice, but cuts the work of maintaining two different functions. The function is fully defined within the GENERATE_ADD(type,prefix)
macro, so if you ever need to change the function, you change this macro and everything else changes.
Next, DEFINE_ADD()
will be called from add.c
to generate the normal version of add
. DEFINE_INLINE_ADD()
will give you access to a function called inline_add
, which has the same signature as your normal add
function, but it has a different name (the inline_ prefix).
Note: I didn't use the __attribute((always_inline))__
when using the -O3
flag - the __inline__
did the job. However, if you don't wanna use -O3
, use:
#define DEFINE_INLINE_ADD() GENERATE_ADD(static __inline__ __attribute__((always_inline)), inline_)
#include "add.h"
DEFINE_ADD()
Simple call to the DEFINE_ADD()
macro generator. This will declare the normal version of the function (the one that won't get inlined).
#include <stdio.h>
#include "add.h"
DEFINE_INLINE_ADD()
int loop(void)
{
register int i;
for (i = 0; i < 100000; i++)
printf("%d\n", inline_add(i + 1, i + 2));
return 0;
}
Here in loop.c
you can see the call to DEFINE_INLINE_ADD()
. This gives this function access to the inline_add
function. When you compile, all inline_add
function will be inlined.
#include <stdio.h>
#include "add.h"
int loop2(void)
{
register int i;
for (i = 0; i < 100000; i++)
printf("%d\n", add(i + 1, i + 2));
return 0;
}
This is to show you can use the normal version of add
normally from other files.
#include <stdio.h>
#include "add.h"
DEFINE_INLINE_ADD()
int loop3(void)
{
register int i;
printf ("add: %d\n", add(2,3));
printf ("add: %d\n", add(4,5));
for (i = 0; i < 100000; i++)
printf("%d\n", inline_add(i + 1, i + 2));
return 0;
}
This is to show that you can use both the functions in the same compilation unit, yet one of the functions will be inlined, and the other wont (see GDB disass bellow for details).
/* prototypes for main */
int loop (void);
int loop2 (void);
int loop3 (void);
#include <stdio.h>
#include <stdlib.h>
#include "add.h"
#include "loops.h"
int main(void)
{
printf("%d\n", add(1,2));
printf("%d\n", add(2,3));
loop();
loop2();
loop3();
return 0;
}
CC=gcc
CFLAGS=-Wall -pedantic --std=c11
main: add.o loop.o loop2.o loop3.o main.o
${CC} -o $@ $^ ${CFLAGS}
add.o: add.c
${CC} -c $^ ${CFLAGS}
loop.o: loop.c
${CC} -c $^ -O3 ${CFLAGS}
loop2.o: loop2.c
${CC} -c $^ ${CFLAGS}
loop3.o: loop3.c
${CC} -c $^ -O3 ${CFLAGS}
If you use the __attribute__((always_inline))
you can change the Makefile
to:
CC=gcc
CFLAGS=-Wall -pedantic --std=c11
main: add.o loop.o loop2.o loop3.o main.o
${CC} -o $@ $^ ${CFLAGS}
%.o: %.c
${CC} -c $^ ${CFLAGS}
$ make
gcc -c add.c -Wall -pedantic --std=c11
gcc -c loop.c -O3 -Wall -pedantic --std=c11
gcc -c loop2.c -Wall -pedantic --std=c11
gcc -c loop3.c -O3 -Wall -pedantic --std=c11
gcc -Wall -pedantic --std=c11 -c -o main.o main.c
gcc -o main add.o loop.o loop2.o loop3.o main.o -Wall -pedantic --std=c11
$ gdb main
(gdb) disass add
0x000000000040059d <+0>: push %rbp
0x000000000040059e <+1>: mov %rsp,%rbp
0x00000000004005a1 <+4>: mov %edi,-0x4(%rbp)
0x00000000004005a4 <+7>: mov %esi,-0x8(%rbp)
0x00000000004005a7 <+10>:mov -0x8(%rbp),%eax
0x00000000004005aa <+13>:mov -0x4(%rbp),%edx
0x00000000004005ad <+16>:add %edx,%eax
0x00000000004005af <+18>:pop %rbp
0x00000000004005b0 <+19>:retq
(gdb) disass loop
0x00000000004005c0 <+0>: push %rbx
0x00000000004005c1 <+1>: mov $0x3,%ebx
0x00000000004005c6 <+6>: nopw %cs:0x0(%rax,%rax,1)
0x00000000004005d0 <+16>:mov %ebx,%edx
0x00000000004005d2 <+18>:xor %eax,%eax
0x00000000004005d4 <+20>:mov $0x40079d,%esi
0x00000000004005d9 <+25>:mov $0x1,%edi
0x00000000004005de <+30>:add $0x2,%ebx
0x00000000004005e1 <+33>:callq 0x4004a0 <__printf_chk@plt>
0x00000000004005e6 <+38>:cmp $0x30d43,%ebx
0x00000000004005ec <+44>:jne 0x4005d0 <loop+16>
0x00000000004005ee <+46>:xor %eax,%eax
0x00000000004005f0 <+48>:pop %rbx
0x00000000004005f1 <+49>:retq
(gdb) disass loop2
0x00000000004005f2 <+0>: push %rbp
0x00000000004005f3 <+1>: mov %rsp,%rbp
0x00000000004005f6 <+4>: push %rbx
0x00000000004005f7 <+5>: sub $0x8,%rsp
0x00000000004005fb <+9>: mov $0x0,%ebx
0x0000000000400600 <+14>:jmp 0x400625 <loop2+51>
0x0000000000400602 <+16>:lea 0x2(%rbx),%edx
0x0000000000400605 <+19>:lea 0x1(%rbx),%eax
0x0000000000400608 <+22>:mov %edx,%esi
0x000000000040060a <+24>:mov %eax,%edi
0x000000000040060c <+26>:callq 0x40059d <add>
0x0000000000400611 <+31>:mov %eax,%esi
0x0000000000400613 <+33>:mov $0x400794,%edi
0x0000000000400618 <+38>:mov $0x0,%eax
0x000000000040061d <+43>:callq 0x400470 <printf@plt>
0x0000000000400622 <+48>:add $0x1,%ebx
0x0000000000400625 <+51>:cmp $0x1869f,%ebx
0x000000000040062b <+57>:jle 0x400602 <loop2+16>
0x000000000040062d <+59>:mov $0x0,%eax
0x0000000000400632 <+64>:add $0x8,%rsp
0x0000000000400636 <+68>:pop %rbx
0x0000000000400637 <+69>:pop %rbp
0x0000000000400638 <+70>:retq
(gdb) disass loop3
0x0000000000400640 <+0>: push %rbx
0x0000000000400641 <+1>: mov $0x3,%esi
0x0000000000400646 <+6>: mov $0x2,%edi
0x000000000040064b <+11>:mov $0x3,%ebx
0x0000000000400650 <+16>:callq 0x40059d <add>
0x0000000000400655 <+21>:mov $0x400798,%esi
0x000000000040065a <+26>:mov %eax,%edx
0x000000000040065c <+28>:mov $0x1,%edi
0x0000000000400661 <+33>:xor %eax,%eax
0x0000000000400663 <+35>:callq 0x4004a0 <__printf_chk@plt>
0x0000000000400668 <+40>:mov $0x5,%esi
0x000000000040066d <+45>:mov $0x4,%edi
0x0000000000400672 <+50>:callq 0x40059d <add>
0x0000000000400677 <+55>:mov $0x400798,%esi
0x000000000040067c <+60>:mov %eax,%edx
0x000000000040067e <+62>:mov $0x1,%edi
0x0000000000400683 <+67>:xor %eax,%eax
0x0000000000400685 <+69>:callq 0x4004a0 <__printf_chk@plt>
0x000000000040068a <+74>:nopw 0x0(%rax,%rax,1)
0x0000000000400690 <+80>:mov %ebx,%edx
0x0000000000400692 <+82>:xor %eax,%eax
0x0000000000400694 <+84>:mov $0x40079d,%esi
0x0000000000400699 <+89>:mov $0x1,%edi
0x000000000040069e <+94>:add $0x2,%ebx
0x00000000004006a1 <+97>:callq 0x4004a0 <__printf_chk@plt>
0x00000000004006a6 <+102>:cmp $0x30d43,%ebx
0x00000000004006ac <+108>:jne 0x400690 <loop3+80>
0x00000000004006ae <+110>:xor %eax,%eax
0x00000000004006b0 <+112>:pop %rbx
0x00000000004006b1 <+113>:retq
$ objdump -t main | grep add
0000000000000000 l df *ABS* 0000000000000000 add.c
000000000040059d g F .text 0000000000000014 add
$ objdump -t main | grep loop
0000000000000000 l df *ABS* 0000000000000000 loop.c
0000000000000000 l df *ABS* 0000000000000000 loop2.c
0000000000000000 l df *ABS* 0000000000000000 loop3.c
00000000004005c0 g F .text 0000000000000032 loop
00000000004005f2 g F .text 0000000000000047 loop2
0000000000400640 g F .text 0000000000000072 loop3
$ objdump -t main | grep main
main: file format elf64-x86-64
0000000000000000 l df *ABS* 0000000000000000 main.c
0000000000000000 F *UND* 0000000000000000 __libc_start_main@@GLIBC_2.2.5
00000000004006b2 g F .text 000000000000005a main
$ objdump -t main | grep inline
$
Well, that's it. After 3 hours of banging my head in the keyboard trying to figure it out, this was the best I could come up with. Feel free to point any errors, I'll really appreciate it. I got really interested in this particular inline one function call.
Upvotes: 3
Reputation: 1708
Here's a suggestion, write the body of the code in a separate header file. Include the header file in place where it has to be inline and into a body in a C file for other calls.
void demo(void)
{
#include myBody.h
}
importantloop
{
// code
#include myBody.h
// code
}
Upvotes: 1
Reputation: 2884
The answer is it depends on your function, what you request and the nature of your function. Your best bet is to:
Compiler hints
The answers here cover just one side of inlining, the language hints to the compiler. When the standard says:
Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined
This can be the case for other stronger hints such as:
__attribute__((always_inline))
: Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified.__forceinline
: The __forceinline keyword overrides the cost/benefit analysis and relies on the judgment of the programmer instead. Exercise caution when using __forceinline. Indiscriminate use of __forceinline can result in larger code with only marginal performance gains or, in some cases, even performance losses (due to increased paging of a larger executable, for example).Even both of these would rely on the inlining being possible, and crucially on compiler flags. To work with inlined functions you also need to understand the optimisation settings of your compiler.
It may be worth saying inlining can also be used to provide replacements for existing functions just for the compilation unit you are in. This can be used when an approximate answers are good enough for your algorithm, or a result can be achieved in a faster way with local data-structures.
An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.
Some functions cannot be inlined
For example, for the GNU compiler functions that cannot be inlined are:
Note that certain usages in a function definition can make it unsuitable for inline substitution. Among these usages are: variadic functions, use of alloca, use of variable-length data types (see Variable Length), use of computed goto (see Labels as Values), use of nonlocal goto, and nested functions (see Nested Functions). Using -Winline warns when a function marked inline could not be substituted, and gives the reason for the failure.
So even always_inline
may not do what you expect.
Compiler Options
Using C99's inline hints will rely on you instructing the compiler the inline behavour you are looking for.
GCC for instance has:
-fno-inline
, -finline-small-functions
, -findirect-inlining
, -finline-functions
, -finline-functions-called-once
, -fearly-inlining
, -finline-limit=n
Microsoft compiler also has options that dictate the effectiveness of inline. Some compilers will also allow optimization to take into account running profile.
I do think it's worth seeing inlining in the broader context of program optimization.
Preventing Inlining
You mention that you don't want certain functions inlined. This might be done by setting something like __attribute__((always_inline))
without turning on the optimizer. However you would probably would want the optimizer. One option here would be to hint you don't want it: __attribute__ ((noinline))
. But why would this be the case?
Other forms of optimization
You may also consider how you might restructure your loop and avoiding branches. Branch prediction can have a dramatic effect. For an interesting discussion on this see: Why is it faster to process a sorted array than an unsorted array?
Then you also might smaller inner loops to be unrolled and to look at invariants.
Upvotes: 4
Reputation: 4893
It is possible to enable inlining per translation unit (but not per call). Though this is not an answer for the question and is an ugly trick, it conforms to C standard and may be interesting as related stuff.
The trick is to use extern
definition where you do not want to inline, and extern inline
where you need inlining.
Example:
$ cat func.h
int func();
$ cat func.c
int func() { return 10; }
$ cat func_inline.h
extern inline int func() { return 5; }
$ cat main.c
#include <stdio.h>
#ifdef USE_INLINE
# include "func_inline.h"
#else
# include "func.h"
#endif
int main() { printf("%d\n", func()); return 0; }
$ gcc main.c func.c && ./a.out
10 // non-inlined version
$ gcc main.c func.c -DUSE_INLINE && ./a.out
10 // non-inlined version
$ gcc main.c func.c -DUSE_INLINE -O2 && ./a.out
5 // inlined!
You can also use non-standard attribute (e.g. __attribute__(always_inline))
in GCC) for extern inline
definition, instead of relying on -O2
.
BTW, the trick is used in glibc.
Upvotes: 18
Reputation: 92
I assume that your function is a little one since you want to inline it, if so why don't you write it in asm?
As for inlining only a specific call to a function I don't think there exists something to do this task for you. Once a function is declared as inline and if the compiler will inline it for you it will do it everywhere it sees a call to that function.
Upvotes: -3
Reputation: 103
the traditional way to force inline a function in C was to not use a function at all, but to use a function like macro. This method will always inline the function, but there are some problems with function like macros. For example:
#define ADD(x, y) ((x) + (y))
printf("%d\n", ADD(2, 2));
There is also the inline keyword, which was added to C in the C99 standard. Notably, Microsoft's Visual C compiler doesn't support C99, and thus you can't use inline with that (miserable) compiler. Inline only hints to the compiler that you want the function inlined - it does not guarantee it.
GCC has an extension which requires the compiler to inline the function.
inline __attribute__((always_inline)) int add(int x, int y) {
return x + y;
}
To make this cleaner, you may want want to use a macro:
#define ALWAYS_INLINE inline __attribute__((always_inline))
ALWAYS_INLINE int add(int x, int y) {
return x + y;
}
I don't know of a direct way of having a function that can be force inlined on certain calls. But you can combine the techniques like this:
#define ALWAYS_INLINE inline __attribute__((always_inline))
#define ADD(x, y) ((x) + (y))
ALWAYS_INLINE int always_inline_add(int x, int y) {
return ADD(x, y);
}
int normal_add(int x, int y) {
return ADD(x, y);
}
Or, you could just have this:
#define ADD(x, y) ((x) + (y))
int add(int x, int y) {
return ADD(x, y);
}
int main() {
printf("%d\n", ADD(2,2)); // always inline
printf("%d\n", add(2,2)); // normal function call
return 0;
}
Also, note that forcing the inline of a function might not make your code faster. Inline functions cause larger code to be generated, which might cause more cache misses to occur. I hope that helps.
Upvotes: 6
Reputation: 103515
In C (as opposed to C++) there's no standard way to suggest that a function should be inlined. It's only vender-specific extensions.
However you specify it, as far as I know the compiler will always try to inline every instance, so use that function only once:
original:
int MyFunc() { /* do stuff */ }
change to:
inline int MyFunc_inlined() { /* do stuff */ }
int MyFunc() { return MyFunc_inlined(); }
Now, in theplaces where you want it inlined, use MyFunc_inlined()
Note: "inline" keyword in the above is just a placeholder for whatever syntax gcc uses to force an inlining. If H2CO3's deleted answer is to be trusted, that would be:
static inline __attribute__((always_inline)) int MyFunc_inlined() { /* do stuff */ }
Upvotes: 17