Jay
Jay

Reputation: 394

How to properly use the libdwarf information to get the local variable location

Preface: I apologize for the lengthy preparation for my question, the reason for this is to make sure this post is self-contained and wanted to include all of the necessary information that I found.

My question correlates to this good post by Mr. Eli Bendersky https://eli.thegreenplace.net/2011/02/07/how-debuggers-work-part-3-debugging-information

Therefore, I will be using the input code below for my question:

#include <stdio.h>
void do_stuff(int my_arg)
{
    int my_local = my_arg + 2;
    int i;

    for (i = 0; i < my_local; ++i)
        printf("i = %d\n", i);
}
int main()
{
    do_stuff(2);
    return 0;
}

Above code is compiled gcc -g tracedprog2.c -o tracedprog2

In addition, I will use the libdwarf example shared here https://github.com/timsnyder/libdwarf-code/tree/3e75142a5d8938466e00a942c41a04f69510915d that can be easily built by the following steps to use the program to replicate my findings (this is not needed, just wanted to share in case anyone might be looking for it):

cd libdwarf-code
mkdir build && cd build
cmake -DBUILD_DWARFEXAMPLE=TRUE ..
make -j4
// built binaries will be available in the directory: $HOME/libdwarf-code/build/src/bin/dwarfexample

The question is as stated in the title, how do you use the information gathered by the libdwarf to get the location of the local variable?

So as stated in Mr. Bendersky's post, the first thing to do is obtain libdwarf information by objdump --dwarf=info ./tracedprog2, which will output information like this (I only included information that will be helpful):

<1><8a>: Abbrev Number: 5 (DW_TAG_subprogram)                                                                                                                
    <8b>   DW_AT_external    : 1                                                                                                                              
    <8b>   DW_AT_name        : (indirect string, offset: 0x29): do_stuff                                                                                      
...                                                                                                                          
    <92>   DW_AT_low_pc      : 0x1135
    <9a>   DW_AT_high_pc     : 0x43
    <a2>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <a4>   DW_AT_GNU_all_tail_call_sites: 1
...
 <2><b3>: Abbrev Number: 7 (DW_TAG_variable)
    <b4>   DW_AT_name        : (indirect string, offset: 0x0): my_local
...
    <bb>   DW_AT_type        : <0x57>
    <bf>   DW_AT_location    : 2 byte block: 91 68      (DW_OP_fbreg: -24)

my understanding is that in order to figure out the location of local variables, many pieces of information are needed (shown as opcodes):

  1. libdwarf's frame base: DW_OP_call_frame_cfa
  2. libdwarf's local variable offset: DW_OP_fbreg

Now here is where things get quite tricky, after reading through DWARF guidebook (https://dwarfstd.org/doc/DWARF5.pdf), it is stated:

The DW_OP_call_frame_cfa operation pushes the value of the CFA, obtained from the Call Frame Information (see Section 6.4 on page 171)

which is where the binary frame1 from dwarfexample shared above (https://github.com/timsnyder/libdwarf-code/tree/3e75142a5d8938466e00a942c41a04f69510915d/src/bin/dwarfexample) tries to parse this CFA information into a readable format for the users.

Upon running the ./frame1 tracedprog2 code, the output you get looks something like this (this program will parse call information entry (CIE) information from the frame description entry (FDE)); Below is the frame information of function do_stuff as that is the focal point of this question. I found a better way to output the data by using readelf -w ./tracedprog2

00000088 000000000000001c 0000005c FDE cie=00000030 pc=0000000000001135..0000000000001178
  DW_CFA_advance_loc: 1 to 0000000000001136
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r6 (rbp) at cfa-16
  DW_CFA_advance_loc: 3 to 0000000000001139
  DW_CFA_def_cfa_register: r6 (rbp)
  DW_CFA_advance_loc: 62 to 0000000000001177
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

From the description from the DWARF5 book,

15. DW_CFA_def_cfa takes two unsigned LEB128 arguments representing a
register number and an offset. The required action is to define the
current CFA rule to use the provided register and offset.
16. DW_CFA_def_cfa_register takes a single unsigned LEB128 argument
representing a register number. The required action is to define the
current CFA rule to use the provided register (but to keep the old
offset).
17. DW_CFA_def_cfa_offset takes a single unsigned LEB128 argument
representing an offset. The required action is to define the current CFA
rule to use the provided offset (but to keep the old register).

important information seems to be the value of DW_CFA_def_cfa and DW_CFA_def_cfa_register, which I think might be the frame base I'm looking for.

Therefore, to get the location of the variable my_local, here is what I think needs to be done:

First, CFA is RSP + 8 as defined in DW_CFA_def_cfa. Next, DW_CFA_offset is cfa - 16, which makes it RSP - 8? From there, there is DW_CFA_def_cfa_offset: 16, which seems to suggest I need to add like this RSP - 8 + 16, to make it RSP + 8. Then, using the value DW_CFA_def_cfa_register: r6 (rbp), RSP changes to RBP, so it is now RBP + 8. From here, you add the DW_OP_fbreg: -24 of my_local variable to get the RBP - 0x10. However, I see that in objdump, it is -0x14(%rbp),%eax.

0000000000001135 <do_stuff>:
    1135:       55                      push   %rbp
    1136:       48 89 e5                mov    %rsp,%rbp
    1139:       48 83 ec 20             sub    $0x20,%rsp
    113d:       89 7d ec                mov    %edi,-0x14(%rbp)
    1140:       8b 45 ec                mov    -0x14(%rbp),%eax
    1143:       83 c0 02                add    $0x2,%eax
    1146:       89 45 f8                mov    %eax,-0x8(%rbp)

I believe I was able to find all of the necessary information needed to calculate the local variable location but seems like I am missing something somewhere. Could anyone please let me know what I might be missing? Thank you in advance.

Upvotes: 2

Views: 894

Answers (1)

Jay
Jay

Reputation: 394

So it seems like I misunderstood what objdump -S ./tracedprog2 showed me which led to the wrong conclusion that I stated above.

For instance, dumping with DWARF information will show disassembly + source code like this:

void do_stuff(int my_arg)                                                                                                                                     
{                                                                                                                                                             
    1135:       55                      push   %rbp                                                                                                           
    1136:       48 89 e5                mov    %rsp,%rbp                                                                                                      
    1139:       48 83 ec 20             sub    $0x20,%rsp                                                                                                     
    113d:       89 7d ec                mov    %edi,-0x14(%rbp)
    int my_local = my_arg + 2;                                                 
    1140:       8b 45 ec                mov    -0x14(%rbp),%eax
    1143:       83 c0 02                add    $0x2,%eax
    1146:       89 45 f8                mov    %eax,-0x8(%rbp)
    int i;                                                                     
                                                                               
    for (i = 0; i < my_local; ++i)
    1149:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    1150:       eb 1a                   jmp    116c <do_stuff+0x37>
        printf("i = %d\n", i);                                                 
    1152:       8b 45 fc                mov    -0x4(%rbp),%eax
    1155:       89 c6                   mov    %eax,%esi
    1157:       48 8d 3d a6 0e 00 00    lea    0xea6(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    115e:       b8 00 00 00 00          mov    $0x0,%eax
    1163:       e8 c8 fe ff ff          callq  1030 <printf@plt>
    for (i = 0; i < my_local; ++i)

where as you can see, my_local is right above the line

1140: 8b 45 ec mov -0x14(%rbp),%eax

Which made me believe I was trying to find the offset calculation of -0x14(%rbp).

Now I think I have a good idea of what is happening upon reading many additional sources which took a bit to find (I will cite them below in case anyone wants to verify my answer).

So long story short, let me expand the information I have shown above to see whether I can clarify my understanding and how I was able to reach the solution:

00000000 0000000000000014 00000000 CIE                                                                                                                        
  Version:               1                                                                                                                                    
  Augmentation:          "zR"                                                                                                                                 
  Code alignment factor: 1                                                                                                                                    
  Data alignment factor: -8                                                                                                                                   
  Return address column: 16                                                                                                                                   
  Augmentation data:     1b                                                                                                                                   
  DW_CFA_def_cfa: r7 (rsp) ofs 8                                                                                                                              
  DW_CFA_offset: r16 (rip) at cfa-8                                                                                                                           
  DW_CFA_undefined: r16 (rip) 
...
00000088 000000000000001c 0000005c FDE cie=00000030 pc=0000000000001135..0000000000001178
  DW_CFA_advance_loc: 1 to 0000000000001136
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r6 (rbp) at cfa-16
  DW_CFA_advance_loc: 3 to 0000000000001139
  DW_CFA_def_cfa_register: r6 (rbp)

Contents of the .debug_loc section:

    Offset   Begin            End              Expression
    00000000 0000000000001178 0000000000001179 (DW_OP_breg7 (rsp): 8)
    00000014 0000000000001179 000000000000117c (DW_OP_breg7 (rsp): 16)
    00000028 000000000000117c 000000000000118c (DW_OP_breg6 (rbp): 16)
    0000003c 000000000000118c 000000000000118d (DW_OP_breg7 (rsp): 8)
    00000050 <End of list>
    00000060 0000000000001135 0000000000001136 (DW_OP_breg7 (rsp): 8)
    00000074 0000000000001136 0000000000001139 (DW_OP_breg7 (rsp): 16)
    00000088 0000000000001139 0000000000001177 (DW_OP_breg6 (rbp): 16)
    0000009c 0000000000001177 0000000000001178 (DW_OP_breg7 (rsp): 8)
    000000b0 <End of list>

The above additional information can be found by compiling with DWARF2 rather than default DWARF5 (Source: https://blog.tartanllama.xyz/writing-a-linux-debugger-variables/).

OK, so at first, the CFA register is initially set to rsp + 8 (Source: https://lists.dwarfstd.org/pipermail/dwarf-discuss/2010-August/000915.html).

Then upon reaching the do_stuff frame at the address 0x1135, we insert a new row in the FDE table at 0x1136 (hence, that's what the value 1 represents). Now, this column will have an offset of 16 due to the statement DF_CFA_def_cfa_offset.

What does this mean? Instead of rsp + 8 as we saw earlier, now from 0x1136 until wherever this row finishes, it will now be rsp + 16.

So next, we create a new row after adding 3 to the current address (e.g., 0x1139 and from here, we will define the CFA register to rbp. Because we did not change the offset until now, all this means is that from 0x1139 onward, it will be rbp + 16 instead of rsp + 16.

Basically, that's it, frame base that we are looking for in order to calculate the local variable my_local is rsp + 16. Now we take a look at the contents of the .debug_loc section, and the result seems to align with my explanation above.

Now go back to the

 <2><b3>: Abbrev Number: 7 (DW_TAG_variable)
    <b4>   DW_AT_name        : (indirect string, offset: 0x0): my_local
...
    <bb>   DW_AT_type        : <0x57>
    <bf>   DW_AT_location    : 2 byte block: 91 68      (DW_OP_fbreg: -24)

and there is DW_OP_fbreg: -24 value, you simply add this to the frame base we found, so that would mean rbp + 16 - 24 = rbp - 8, and that would be an equivalent mov %eax, -0x8(%rbp).

Now that I looked at Mr. Bendersky's post again, this seems to align with his answer, but I somehow missed it, and apparently starting DWARF version 5, .debug_loc seems to be not included by default, which misled me to pursue the wrong conclusion initially.

I hope this solution is correct (I think it makes sense to me) and please let me know if it is incorrect as I'm also still uncertain about DWARF (it's very complicated for a newbie like me).

Upvotes: 1

Related Questions