Reputation: 434
I am using ptrace
to follow a process and monitor its behavior. At some point I would like to get the next rip
address before hitting the next instruction. In fact, I would like to call get the address of the instruction following a callq
instruction. There are a few different such instructions (near, far, relative, absolute etc) and they don't all have the same length.
Is there a way using ptrace
to get the size in bytes of the instruction, once the instruction has been retrieved. Something like the following:
int ip = ptrace(PTRACE_PEEKUSER, t_pid, ipoffs, 0); // some addr where ip points
long isntruction = ptrace(PTRACE_PEEKTEXT, t_pid, ip, NULL); // e8 ae 72 f8 ff (relative call)
printf("Instruction is %d bytes", get_instruction_size(instruction)); // Instrction is 5 bytes
I am guessing one way to implement get_instruction_size
would be to get the opcode (first 1 or 2 bytes) of the instruction and then determine how long it should be according to the x86 architecture/manual. But I feel like there will be a lot of special cases to take into account and a lot of reading around to find the values + this will change from one CPU architecture to another. On the other hand dynamically finding the size seems much more convenient. I have not found an answer to this.
------ EDIT -------
Trying to retrieve the return value from rsp right after a call:
#define M_OFFSETOF(STRUCT, ELEMENT) \
(unsigned long) &((STRUCT *)NULL)->ELEMENT;
...
ipoffs = M_OFFSETOF(struct user, regs.rip);
spoffs = M_OFFSETOF(struct user, regs.rsp);
...
while(1) {
// exec one instruction
if(ptrace(PTRACE_SINGLESTEP, t_pid, 0, signo) < 0){
perror("ptrace single step error\n");
exit(EXIT_FAILURE);
}
ip = ptrace(PTRACE_PEEKUSER, t_pid, ipoffs, 0);
full_instruction = ptrace(PTRACE_PEEKTEXT, t_pid, ip, NULL);
opcode = (unsigned)0xFF & full_instruction;
if(opcode == ADDR32){
opcode = ((unsigned)0xFF00 & full_instruction) >> 8;
}
if(call_found){
sp = ptrace(PTRACE_PEEKUSER, t_pid, spoffs, 0);
// print sp ...
call_found = false;
}
if(opcode == CALL)
call_found = true;
}
Upvotes: 1
Views: 426
Reputation: 364308
ptrace doesn't have a disassembler in the kernel1, and the hardware itself won't tell you this until after the call
instruction has executed.
If you can wait until after the instruction executes, probably your best bet is to PTRACE_SINGLESTEP
then read the return address call
pushed onto the stack. (ESP/RSP will point at it2).
The other option is of course to decode it yourself (including any prefixes which might get used for padding, like when ld
relaxes 6-byte call [got_entry]
to 1+5 byte addr32 call rel32
). Either with a disassembler library, or by scanning through prefixes until you get to one of the call
opcodes, then you have the length either from it (E8 call rel32
) or from decoding the ModRM byte for an indirect FF /2 call [r/m32]
. (https://www.felixcloutier.com/x86/call).
If you wanted portability to non-x86 ISAs, many use a link register instead of pushing a return address so it's not identical; you can't just generically deref the stack pointer with uintptr_t
width.
And many of those ISAs have fixed instruction widths so you could just go one instruction forward instead of single-stepping and reading a register. (Although many support a compact encoding with 2 or 4-byte instructions, like ARM Thumb, MIPS, and RISC-V).
There are other wrinkles on some ISAs, for example MIPS has a branch delay slot so the return address is actually after the next instruction after a jal
.
Footnote 1: (fun fact: ARM Linux kernels used to have a disassembler with support for some instructions, so it could emulate single-step for you, but that hack was removed).
Footnote 2: Even for hand-written asm with a far
call, the CS:[ER]IP return address will have the offset part at the lowest address, i.e. pointed to by ESP/RSP. Of course, far calls use different opcodes, so you could treat them separately or ignore them.
I'm not sure if it's possible for prefixes to override a size in a way that will get call
to push a different-sized return address. (e.g. 16-bit in 32-bit mode). Probably not, and even if so it would only be a concern for malicious binaries trying to fool your tracer on purpose. Even hand-written asm for GNU/Linux is vanishingly unlikely to do this for any normal reason.
Upvotes: 3