Reputation: 827
The instructions in x86 machine code are variable length. I've studied the x86 instruction set thoroughly. I've read about how to convert assembly to machine code. But I didn't see in any of my studying so far (maybe I missed something) how the processor knows where one instruction ends and the next begins.
Take the following:
XOR CL, [12H] = 00110010 00001110 00010010 00000000 = 32H 0EH 12H 00H
XOR CL, 12H = 10000000 11110001 00010010 = 80 F1 12
If I'm looking at:
00110010 00001110 00010010 00000000 10000000 11110001 00010010 ... 32H 0EH 12H 00H 80 F1 12 ... ^ How do I know the next instruction starts here?
When I was studying the OSI model in networking, packets would solve the variable layer size problem by including a value at the start of each component telling you how much content the layer would contain. But CPU instructions are much more compact than packets and don't seem to contain that.
Why? What am I trying to do, really?
My goal is to analyze the machine code of a program (without a disassembler - I need maximum processing speed to analyze large volumes of data, and a disassembler does more work than I need to do, like mapping the binary to string syntax) and record certain statistics about the opcodes used. But I obviously have to figure out where one instruction ends and the next begins to do that.
Looking at x86 machine code, how do I determine the starting location of the next instruction?
Upvotes: 1
Views: 483
Reputation: 179971
There's just no explicit marker. You need to decode each instruction in turn. Each instruction has a certain length, the next instruction follows immediately afterwards.
If you look at more modern variable-length encodings such as UTF-8, you'll find that they are more logically defined than the x86 instruction set. That's just a consequence of lessons learned. ARM learned the lesson too, and made all instructions 32 bit.
Upvotes: 2