Reputation: 871

What do the CFI directives mean? (and some more questions)

Ok, this is gonna be a long question. I'm trying to understand how "buffer overflow" works. I am reading Smashing the stack for fun and profit by aleph1 and have just got the disassembly of the following code:

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

The disameembly using -S flag of GCC gives me:

    .file   "example1.c"
    .text
    .globl  function
    .type   function, @function
function:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $48, %rsp
    movl    %edi, -36(%rbp)
    movl    %esi, -40(%rbp)
    movl    %edx, -44(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movq    -8(%rbp), %rax
    xorq    %fs:40, %rax
    je  .L2
    call    __stack_chk_fail
.L2:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   function, .-function
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $3, %edx
    movl    $2, %esi
    movl    $1, %edi
    call    function
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
    .section    .note.GNU-stack,"",@progbits

the .cfi directives are not in the paper by Aleph1 and I guess that they were not used back then. I have read this question on SO and I get that they are used by GCC for exception handling. I have also read another question on SO and I get that .LFB0, .LFE0, .LFE1 and .LFB1 are labels however I have the following doubts:

I get that .cfi directives are used for exception handling however I don't understand what they mean. I have been here and I see some definitions like:

.cfi_def_cfa register, offset

.cfi_def_cfa defines a rule for computing CFA as: take address from register and add offset to it.

However, if you take a look at the disassembly that I have put above you don't find any register name (like EAX, EBX and so on) instead you find a number there (I have generally found '6') and I don't know how's that supposed to be a register. Especially, can anyone explain what .cfi_def_cfa_offset 16, .cfi_offset 6, -16, .cfi_def_cfa_register 6 and .cfi_def_cfa 7, 8 mean? Also, what does CFA mean? I am asking this because mostly in books/papers the procedure prolog is like :

 pushl %ebp
 movl %esp,%ebp
 subl $20,%esp

However, now I think the procedure prolog in modern computers is as follows:

    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $48, %rsp

Initially I thought that the CFI directives are used instead of sub mnemonic to set the offset but that's not the case; the sub command is still being used in spite of using the CFI directives.

I understood that there are labels for each procedure. However, why are multiple nested labels inside a procedure? In my case main has .LFB1 and .LFE2 labels. What is the need for multiple labels? Similarly the function procedure has the labels .LFB0, .L2 and .LFE0
The last 3 lines for both the procedures seem to be used for some housekeeping functions (telling the size of the procedure, maybe?) but I am not sure what do they mean. Can anyone explain what do they mean and what's their use?

EDIT:

(adding one more question)

Do the CFI directives take up any space? Because in the procedure "function", each int parameter take up 4 bytes and the number of it is 3, so all parameter takes 12 bytes in memory. Next, the first char array takes 8 bytes (round up 5bytes to 8bytes), and next char array takes 12bytes (round up 10bytes to 12bytes), so the whole char array takes 20 bytes. Summing these all, parameter and local variables only need 12+20=32 bytes.

But in the procedure "function", compiler subtract 48 bytes to store values. Why?

Upvotes: 24

Answers (3)

blabb

Reputation: 9007

Lindy Dancer Answered what cfi and cfa means (call frame information ) and (call frame address )

.L<num> denotes labels as per various tidbits in Google in x64 GCC names all labels in the following format start with .L and end with a numeral so .L1 , .L2 , .L....infinity are labels

according to Google and some earlier SO answers BF<num> indicates Function-Begin and EF<num> indicates FUNCTION-END

so .LBF0 , .LBF1 . LBF.....infinity and .LFE0 ,......., .LFE....infinity

denotes function begins and function ends in each function which the compiler probably requires to take care of some internal needs so you should forget them at this moment unless there is a very grave need to dig into compiler internals

the other label .L2 exists to address the branching instruction je in your function

je  .L2

also every compiler aligns and pads the access to arguments and locals to certain boundary

i can't be sure but x64 default align is 16 bytes I think for GCC so if you request an odd reservation like

char foo[5] or
BYTE blah [10]

the indices 5 and 10 are not aligned even for x86

for 5 x86 compiler will assign 8 bytes and for 10 16 bytes

like wise x64 gcc might assign 16 bytes for each of your requests

you actually shouldn't be worrying about why compiler does what it does

when you are trying to understand logic of assembly just concentrate on addresses

if the compiler decided that it will put x at rbp +/- X it will also access it at the same location through out the scope or life of that variable

Upvotes: 7

phorgan1

Reputation: 1744

The 48 is to skip over both the arguments and the locals. The 5 byte array is aligned on an 8 byte boundary, and the 10 byte on a 16 byte boundary. The arguments take 8 bytes each, so 3*8 for arguments plus 8 + 16 for locals gives 24+24 or 48. You can see it in gdb just by asking for the address of each of those things.

Upvotes: 2

Lindydancer

Reputation: 26114

CFI stands for call frame information. It's the way the compiler describes what happens in a function. It can be used by the debugger to present a call stack, by the linker to synthesise exceptions tables, for stack depth analysis and other things like that.

Effectively, it describes where resources such as processor registers are stored and where the return address is.

CFA stands for call frame address, which mean the address the stack pointer location of the caller function. This is needed to pick up information about the next frame on the stack.

Upvotes: 22

What do the CFI directives mean? (and some more questions)

Answers (3)

Related Questions