Reputation: 162
I am building a disassembler for RISC-V binaries using the capstone engine. The issue I am facing is that after checking the input file (arch, bitness, if has any program header...) I have this for loop that iterates over all program headers looking for the ones that have executable code.
void checkElf(const char *elfFile)
{
// Here would be the mentioned checks
uint8_t i;
for (i = 0; i < header.e_phnum; i++) {
uint32_t offset = header.e_phoff + header.e_phentsize * i;
fseek(file, offset, SEEK_SET);
fread(&program_header, sizeof(program_header), 1, file);
if (((PF_X | PF_R) == program_header.p_flags)) {
dumpCode(file, &program_header, &header);
}
}
}
If any program header is marked as executable, then I call the following function:
static void dumpCode(FILE *file, Elf32_Phdr *segm, Elf32_Ehdr *header)
{
int32_t *opcode;
uint32_t offset, vaddr, i;
char *mappedFile;
struct stat statbuf;
int fd;
fd = fileno(file);
fstat(fd, &statbuf);
mappedFile = (char *) mmap(0, statbuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
offset = segm->p_offset;
opcode = (int *) (mappedFile + offset);
vaddr = segm->p_vaddr;
i = 0;
if (0 == offset) {
vaddr = header->e_entry;
i = (header->e_entry - segm->p_vaddr) / 4;
opcode += i;
}
for (; i < segm->p_filesz / 4; i++, vaddr += 4) {
// do stuff...
}
}
In that function, if the current ph starts at offset 0 (contains the elf header), I update the position of the virtual address and the opcode, if not I directly start disassembling.
My question is, should I care about where the ph containing the executable code is placed? Or better said, could the ph that contains the executable code be placed somewhere else?
Upvotes: 0
Views: 121
Reputation: 213636
I think this answer answers the question you are actually asking.
Your code assumes that an executable PT_LOAD
segment contains executable code and nothing else, but that is generally not the case: as the two-segment example in cited answer shows, a typical executable layout may have all of these sections: .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
in that segment, and so you'll disassemble a whole lot of garbage.
There is also absolutely no guarantee that only .text
follows e_entry
, so skipping the beginning of the segment up to e_entry
doesn't solve anything.
Upvotes: 0