Reputation: 51
I have a problem parsing lua bytecode generated using luaJ. Between the instruction count and the constant count something gets wrong. It seems like there is a byte missing. I'm using LuaJ 2.0.3.
Here is a hexdump that shows what I mean:
the bytecode was generated using
string.dump(function() return "athin" end)
The Constant Count shows 250 constants, but there should be only one. If there was 1 byte more between the constant count and the instruction list, it would work perfectly:
The constant count would be 1, and the type of the first constant 4 (string), the string would have a length of 6, including a null at the end.
Why is that not working? Why is there a byte missing? What do I have to do to fix this?
Upvotes: 2
Views: 903
Reputation: 21
Note: I posted this on the CC forums here first.
You are, in fact, missing an 0x00 byte. As the "Instructions", you have 00 00 00 01 01 00 00 1E 00 00 1E 00
Looking at A No-Frills Introduction to Lua 5.1 VM Instructions, that translates to:
LOADK 0 0 -- Load constant at index 0 into register number 0.
RETURN 0 2 -- Return 1 value, starting at register number 0.
MOVE 120 0 -- Copy the value of register number 120 into register number 0.
That last one doesn't make any sense. Why would the bytecode generator insert such a ridiculous instruction that will never be executed?
If you add one 0x00 byte to the last instruction, it reads as, 00 00 00 01 01 00 00 1E 00 00 00 1E
.
That translates to:
LOADK 0 0 -- Load constant at index 0 into register number 0.
RETURN 0 2 -- Return 1 value, starting at register number 0.
RETURN 0 0 -- Return all values from register number 0 to the top of the stack.
If you read the PDF, you will find that the bytecode generator always adds a return statement to the end of the bytecode, even if there's already an explicit return statement in the Lua source. Therefore, this disassembly makes sense.
Anyway, if you add an extra 0x00 byte there, it shifts the rest of the bytecode over so it makes sense, like you said. It's just that the missing 0x00 byte isn't between "Instructions" and "Number of Constants", it's part of an instruction.
Now, I have no idea how this could be useful to you, since the output is directly from CC (or LuaJ), but that's the problem.
Note: After modifying ChunkSpy to accept big-endian chunks, it errored on the bytecode as you posted it, but worked fine with the bytecode if modified either the way you suggested it, or I suggested it.
Upvotes: 2