Josep
Josep

Reputation: 162

Exctracting executable code from program headers

I am building a disassembler for RISC-V binaries using the capstone engine. The issue I am facing is that after checking the input file (arch, bitness, if has any program header...) I have this for loop that iterates over all program headers looking for the ones that have executable code.

void checkElf(const char *elfFile)
{
    // Here would be the mentioned checks
    uint8_t i;
    for (i = 0; i < header.e_phnum; i++) {
        uint32_t offset = header.e_phoff + header.e_phentsize * i;
        fseek(file, offset, SEEK_SET);
        fread(&program_header, sizeof(program_header), 1, file);
        if (((PF_X | PF_R) == program_header.p_flags)) {
            dumpCode(file, &program_header, &header);
        }
    }
}

If any program header is marked as executable, then I call the following function:

static void dumpCode(FILE *file, Elf32_Phdr *segm, Elf32_Ehdr *header)
{
    int32_t *opcode;
    uint32_t offset, vaddr, i;
    char *mappedFile;
    struct stat statbuf;
    int fd;

    fd = fileno(file);
    fstat(fd, &statbuf);

    mappedFile = (char *) mmap(0, statbuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

    offset = segm->p_offset;
    opcode = (int *) (mappedFile + offset);
    vaddr = segm->p_vaddr;
    i = 0;

    if (0 == offset) {
        vaddr = header->e_entry;
        i = (header->e_entry - segm->p_vaddr) / 4;
        opcode += i;
    }

    for (; i < segm->p_filesz / 4; i++, vaddr += 4) {
        // do stuff...
    }
}

In that function, if the current ph starts at offset 0 (contains the elf header), I update the position of the virtual address and the opcode, if not I directly start disassembling.

My question is, should I care about where the ph containing the executable code is placed? Or better said, could the ph that contains the executable code be placed somewhere else?

Upvotes: 0

Views: 121

Answers (1)

Employed Russian
Employed Russian

Reputation: 213636

I think this answer answers the question you are actually asking.

Your code assumes that an executable PT_LOAD segment contains executable code and nothing else, but that is generally not the case: as the two-segment example in cited answer shows, a typical executable layout may have all of these sections: .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr in that segment, and so you'll disassemble a whole lot of garbage.

There is also absolutely no guarantee that only .text follows e_entry, so skipping the beginning of the segment up to e_entry doesn't solve anything.

Upvotes: 0

Related Questions