Reputation: 85
I have read the PE and COFF specification, Matt Pietrek's "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format" and "An In-Depth Look into the Win32 Portable Executable File Format" and also several other sources about the subject.
I was able to read out the COFF section table and also the COFF symbol table from an object file generated by MinGW GCC 4.7 (I was compiling a static library in debug mode).
My ultimate goal is to access all functions defined in a given object file (COFF) and read out all bytes that make up their machine code.
Question 1: How do I calculate the start address of a single function inside the COFF file? I presume I have to somehow use the symbol record's "Value" field as an offset into the section specified by "SectionNumber".
Question 2: How do I find out the length of any given function (how many bytes I would have to read)?
Question 3: According to Microsoft's PE & COFF specification there should be an auxiliary symbol table record after each symbol record that defines a function. Why is it that in my object file (extracted from an .a file which was compiled in debug mode) of three defined functions only one has such an auxiliary record? And that is also completely filled with zeroes?
Upvotes: 2
Views: 2016
Reputation: 129314
Q1: Yes, that seems reasonable.
Q2: Probably difficult. Depends on the processor architecture. There is no guarantee that there is any function information giving the length of the function at all - in particular, there doesn't appear to be anything available for x86 (32-bit), and information on length is only sometimes available on other architectures [when it's required for unwinding after exceptions].
The best way is probably just to load up the symbol table, and find where the NEXT function in address order is, and then assume that the length is from the start of the function to the byte just before the next function. For the last function, obviously "until end of the section". Many years ago, I used the method of recognising return instructions to find the length of functions, but modern compilers often generate code that has more than one return instruction, put if/else code after the return with a jump back to the main function code, etc, so it may not be a reliable method [and of course if someone does x = $0xc3;
, the 0xc3 will look like a return instruction, but it's actually data... ;)
Q3: Auxillary records are entirely optional:
Zero or more auxiliary symbol-table records immediately follow each standard symbol-table record. However, typically not more than one auxiliary symbol-table record follows a standard symbol-table record (except for .file records with long file names).
If there is auxillary symbol-table records, they are indicated at offset 17 in the symbol table record.
This may be confusing if you read only the later text:
Auxiliary symbol table records always follow, and apply to, some standard symbol table record.
I think this should be treated as "If there is an auxillary symbol-table record, it comes immediately after the standard table record".
Upvotes: 1