Reputation: 51
I am building a decompiler for Lua 5.1. (for study purposes only)
Here is the generated code:
main <test.lua:0,0> (12 instructions, 48 bytes at 008D0520)
0+ params, 2 slots, 0 upvalues, 0 locals, 6 constants, 0 functions
1 [1] LOADK 0 -2 ; 2
2 [1] SETGLOBAL 0 -1 ; plz_help_me
3 [2] LOADK 0 -4 ; 24
4 [2] SETGLOBAL 0 -3 ; oh_no
5 [3] GETGLOBAL 0 -1 ; plz_help_me
6 [3] GETGLOBAL 1 -3 ; oh_no
7 [3] ADD 0 0 1
8 [3] SETGLOBAL 0 -5 ; plz_work
9 [4] GETGLOBAL 0 -6 ; print
10 [4] GETGLOBAL 1 -5 ; plz_work
11 [4] CALL 0 2 1
12 [4] RETURN 0 1
constants (6) for 008D0520:
1 "plz_help_me"
2 2
3 "oh_no"
4 24
5 "plz_work"
6 "print"
locals (0) for 008D0520:
upvalues (0) for 008D0520:
Original Code:
plz_help_me = 2
oh_no = 24
plz_work = plz_help_me + oh_no
print(plz_work)
How to build a decompiler efficiently to generate this code? Should I use AST trees to map the behavior of the code? (opcodes in this case)
Upvotes: 5
Views: 1722
Reputation: 7098
It looks like there is a lua decompiler for lua 5.1 written in C that can be used for study; https://github.com/viruscamp/luadec https://www.lua.org/wshop05/Muhammad.pdf describes how it works.
And on the extremely simple example you give, it comes back with the exactly same thing:
$ ./luadec ../tmp/luac.out
-- Decompiled using luadec 2.2 rev: 895d923 for Lua 5.1 from https://github.com/viruscamp/luadec
-- Command line: ../tmp/luac.out
-- params : ...
-- function num : 0
plz_help_me = 2
oh_no = 24
plz_work = plz_help_me + oh_no
print(plz_work)
There is a debug flag -d
to show a little bit about how it works:
-- Decompiled using luadec 2.2 rev: 895d923 for Lua 5.1 from https://github.com/viruscamp/luadec
-- Command line: -d ../tmp/luac.out
LoopTree of function 0
FUNCTION_STMT=0x2e2f5ec0 prep=-1 start=0 body=0 end=11 out=12 block=0x2e2f5f10
----------------------------------------------
1 LOADK 0 1
SET_SIZE(tpend) = 0
next bool: 0
locals(0):
vpend(0):
tpend(1): 0{2}
Note: The Lua VM in version in 5.1 (and other versions up to the latest 5.4) seems to be stack oriented. This means that in order to compute plz_hep_me + oh_no
, both values need to be pushed onto a stack and then when the ADD
bytecode instruction is seen two entries are popped off of the stack and the result is pushed onto the stack. From the decompiler's standpoint, a tree fragment or "AST" fragment of the addition with its two operands is created.
In the output above, a constant is push onto stack tpend
to which we see 0{2}
. The stuff in brackets seems to hold the output representation. Later on, just before the ADD
opcode you will see two entries on the stack. Entries are separate simply with a space in the debug output.
----------------------------------------------
2 SETGLOBAL 0 0
next bool: 0
locals(0):
vpend(1): -1{plz_help_me=2}
tpend(0):
----------------------------------------------
Seems to set variable plz_help_me
from the top of the stack and push the assignment statement instead.
3 LOADK 0 3
SET_SIZE(tpend) = 0
next bool: 0
locals(0):
vpend(0):
tpend(1): 0{24}
...
At this point I guess the decompiler sees that the stack is 0 SET_SIZE(tpend)=0
or the first statement then is done.
Moving onto the next statement, the constant 24 is then loaded onto tpend
.
In vpend
and tpend
, we see the output as it gets built up.
Looking at the C code, It looks like it loops over functions and within that loops over what it thinks are statements, and within that is driven by the a loop over bytecode instructions but "walking" them based on instruction type and information it saves.
It looks it does build an AST on the fly in a somewhat ad-hoc manner. (By "ad hoc" I simply mean programmer code builds the tree).
Although for compilation especially of optimized code Single Static Assignment (SSA) is a win, in the decompilation direction of high-level bytecode such as is found here, usually this is neither needed nor wanted. SSA maps to finite reusable names to a potentially infinite set.
In a decompiler we want to go the other direction from infinite registers (or even finite registers) back to names the programmer used and here this mapping is already given, so just use it. This is simpler too.
If the same name is used for two conceptually different computations, tracking that is probably wanted. Correcting or improving the style of the source code in the way that the decompiler writer feels might be nice, but usually isn't appreciated. (In fact, in my experience many programmers using a decompiler like this will call it a bug, when technically, it isn't.)
The luadec decompiler breaks things into functions and walks over disassembled instructions in that in an ad hoc way based on specific instructions for various opcodes. After this tree or list of tree nodes is built, an AST if you will (and that is the term this decompiler uses), there seem to be custom print routines for each AST node type.
Personally, I think there is a better way, and one that is slightly different from the one suggested in another answer.
I wrote about decompilation from this novel perspective here. I believe it applies to Lua as well. And older, longer more research oriented paper here.
As with many decompilers, logic instructions introduce control flow, and these strategies which are instruction based and that go down a path early on from the instructions, have a hard time shifting these around. So a pattern matching/parsing/term-rewrite approach where control flow is marked in the instructions such as with addition of pseudo Phi-function instructions (or its equivalent). Dominator regions from a control-flow graph also greatly help here.
Edit:
There is also unluac, a lua decompiler written in Java that can be studied. Although there are very few comments in the code describing how it works, the code it self seems well organized and the functions are pretty small and clear. It seems similar in operation in that there is a part that is bytecode operation based, and a part that is more expression-tree based. It seems to be currently worked on and is the only decompiler that supports lua versions from 5.0 up to 5.4.
Upvotes: 0
Reputation: 9714
Lua VM is a register machine with a nearly unlimited supply of registers, which means you don't have to deal with the consequences of register allocation. It makes the whole thing much more bearable than decompiling, say, x86.
A very convenient intermediate representation for going up the abstraction level would have been an SSA. A trivial transform treating registers as local variable pointers and leaving memory loads as is, followed by an SSA-transform [1], will give you a code suitable for further analysis. Next step will be loop detection (done purely on CFG level), and, helped by SSA, detection of the loop variables and loop invariants. Once done, you'll see that only a handful of common patterns exist, that can be directly translated to higher level loops. Detecting if
and other linear control flow sequences is even easier, once you're in an SSA.
One nice property of an SSA is that you can easily construct high level AST expressions out of it. You have a use count for every SSA variable, so you can simply substitute all the single-use variables (that were not produced by side-effect instructions) in place of their use (and side-effect ones too if you maintain their order). Only multi-use variables will remain.
Of course, you'll never get meaningful local variable names out of this procedure. Globals are preserved.
[1] https://pfalcon.github.io/ssabook/latest/
Upvotes: 2