Reputation: 940
Given the line number of a variable access (not declaration), how can I determine its type (or its declaration DIE in the .info
tree)?
Look at the following code:
void foo()
{
{
struct A *b;
}
{
struct B *b;
b = malloc(sizeof(struct B));
}
}
Suppose that I have this source code and it is compiled with debug information in DWARF
format. How can I determine that variable b
is of type struct B *
using the source code and debug information?
I mean how can I automatize it offline? The problem is that in the .info
section of DWARF
there is no mapping between source code (e.g., line number) and scope information. In the example above, using debug information, we can determine that there is a variable of type struct A *
which is a child of foo()
and a variable of type struct B *
which is the other child of foo()
. Parsing the source code can help to determine the nesting level at which the access has occurred, but there is no way to map the accessed variable to its type. Because there are two types at the same level at which b
is accessed.
If there is a way to force the compiler to include more information in the debug information, the problem can be solved. For example, adding DW_AT_high_pc
and DW_AT_low_pc
to the debug information of DIEs of type DW_TAG_lexical_block
will help.
Upvotes: 3
Views: 1103
Reputation: 1562
You have already answered almost all of your own question; there are only two things missing.
Firstly, the relationship between file name/line number and program counter is encoded in .debug_line
, not .debug_info
.
Secondly, the variables are not children of foo()
: each is a child of a lexical block. The relevant portion of the program structure will look like
DW_TAG_compile_unit
DW_TAG_subprogram
DW_TAG_lexical_block
DW_TAG_variable
DW_TAG_lexical_block
DW_TAG_variable
The lexical block should be associated with an address range but this might be encoded using DW_AT_ranges
instead of DW_AT_low_pc
/DW_AT_high_pc
; if that's the case then you'll need to interpret .debug_ranges
.
To illustrate the case in hand I compiled the following with cc -g
(gcc 4.8.5 on Oracle Linux)...
1 #include <stdlib.h>
2
3 struct A { int a; };
4 struct B { int b; };
5
6 void foo()
7 {
8 {
9 struct A *b;
10 }
11
12 {
13 struct B *b;
14 b = malloc(sizeof (struct B));
15 }
16 }
...and used 'readelf -w' to decode the DWARF. Line 14 appears here in the line number table:
[0x00000032] Special opcode 124: advance Address by 8 to 0x8 and Line by 7 to 14
meaning that we're interested in address 0x8. The DIE hierarchy includes
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<1><96>: Abbrev Number: 6 (DW_TAG_subprogram)
<9d> DW_AT_low_pc : 0x0
<a5> DW_AT_high_pc : 0x18
<2><b3>: Abbrev Number: 7 (DW_TAG_lexical_block)
<b4> DW_AT_low_pc : 0x8
<bc> DW_AT_high_pc : 0xe
<3><c4>: Abbrev Number: 8 (DW_TAG_variable)
<c5> DW_AT_name : b
<c7> DW_AT_decl_file : 1
<c8> DW_AT_decl_line : 13
<c9> DW_AT_type : <0xd2>
The DIE at 0xb3 does not contain any further lexical blocks so it represents the tightest scope at address 0x8. At this point, hence, the name "b" must refer to the DIE's child at 0xc4. This variable's type is given by
<1><d2>: Abbrev Number: 9 (DW_TAG_pointer_type)
<d3> DW_AT_byte_size : 8
<d4> DW_AT_type : <0x81>
<1><81>: Abbrev Number: 4 (DW_TAG_structure_type)
<82> DW_AT_name : B
<84> DW_AT_byte_size : 4
<2><8b>: Abbrev Number: 5 (DW_TAG_member)
<8c> DW_AT_name : b
<90> DW_AT_type : <0x34>
<94> DW_AT_data_member_location: 0
<1><34>: Abbrev Number: 3 (DW_TAG_base_type)
<35> DW_AT_byte_size : 4
<36> DW_AT_encoding : 5 (signed)
<37> DW_AT_name : int
EDIT:
In your own answer you've given a counter-example for mplayer in which there are lexical blocks without corresponding address ranges. Such DWARF does not conform to the standard: §3.4 of DWARF 2 states that a lexical block entry has DW_AT_low_pc and DW_AT_high_pc attributes and makes no suggestion that these are optional. A likely candidate for this bug, assuming you're using gcc, is "DWARF debug info for inlined lexical blocks missing range". The default mplayer configuration includes -O2 optimisation, which turns on inlining; you will see this reflected in the parent DW_TAG_subprogram
for draw_vertices()
, from which the example code is taken. A workaround for the bug is to add -fno-inline
to the compiler options; this does not seem to suppress all inlining so you may wish to disable optimisation altogether.
Upvotes: 5
Reputation: 940
Here is the output of objdump --dwarf=info mplayer
for an MPlayer-1.3.0
compiled using -gdwarf-2
option.
<2><4000e>: Abbrev Number: 43 (DW_TAG_lexical_block)
<3><4000f>: Abbrev Number: 37 (DW_TAG_variable)
<40010> DW_AT_name : px
<40013> DW_AT_decl_file : 1
<40014> DW_AT_decl_line : 2079
<40016> DW_AT_type : <0x38aed>
<3><4001a>: Abbrev Number: 37 (DW_TAG_variable)
<4001b> DW_AT_name : py
<4001e> DW_AT_decl_file : 1
<4001f> DW_AT_decl_line : 2080
<40021> DW_AT_type : <0x38aed>
<3><40025>: Abbrev Number: 0
<2><40026>: Abbrev Number: 0
As you can see at offset 0x4000e
, there is a lexical block with no attribute. The corresponding source code is located in libvo/gl_common.c:2078
:
for (i = 0; i < 4; i++) {
int px = 2*i;
int py = 2*i + 1;
mpglTexCoord2f(texcoords[px], texcoords[py]);
if (is_yv12) {
mpglMultiTexCoord2f(GL_TEXTURE1, texcoords2[px], texcoords2[py]);
mpglMultiTexCoord2f(GL_TEXTURE2, texcoords2[px], texcoords2[py]);
}
if (use_stipple)
mpglMultiTexCoord2f(GL_TEXTURE3, texcoords3[px], texcoords3[py]);
mpglVertex2f(vertices[px], vertices[py]);
}
The block is a for block. There are many more similar lexical_block instances.
My solution consists of two parts:
1) Source code analysis:
Find the scope (surrounding left and right braces) where the target variable is accessed. In fact we only need to store the line number of the left brace.
Find the level of the scope in the tree of scopes (a tree that shows parent/child relationships similar to what can be found in .info
.
At this point we have the start line of the scope corresponding to a variable access and the level of the scope in the tree of scopes (e.g., line 12 and level 2 in the code depicted in the original question).
2) DebugInfo analysis:
Now, we can analyze the appropriate CU and look for the declarations of that target variable. The important point is that only the declarations with a line number smaller than the line number of the access point are valid. Considering this, we can search the global scope, and continue with deeper levels, in order.
Declarations with scopes deeper than the scope of the access are invalid. Declarations with the same scope as the target variable are only valid if their line number is between the start line of the target scope and the line number of the variable access.
Upvotes: 1