Reputation: 337
I understand what JIT compilers do, it uses counters to detect hot codes and compiles them, and stores them in the code cache, so that next time the function won't need to be interpreted. But this brings me to two questions:
How does a JIT compiler compile a function that contains calls to other non-jitted functions. Calls to other non-jitted functions need to let the jitted function, which is the native code, calls the outside function that is still supposed to be executed through the interpreter of the JIT compiler.
The second one is a variant of the first one. How does a JIT compiler compile a function that contains a virtual call? it seems that it has to hard-code the vtable lookup procedure into the native code, but what will happen if the vtable is living in the virtual machine, then there must be a way to access the vtable inside the generated native code, but how could this be done, if the virtual machine is implemented using a language like C or C++ maybe you can hard-code the memory address of the vtable into the native code, but how can I do the same if the virtual machine is implemented in some managed language that its users are not encouraged to, or simply cannot access the address of an object, and even if I can access the address the object might be moved during the compact phase of the garbage collector.
Upvotes: 2
Views: 332
Reputation: 56557
- How does a JIT compiler compile a function that contains calls to other non-jitted functions. Calls to other non-jitted functions need to let the jitted function, which is the native code, calls the outside function that is still supposed to be executed through the interpreter of the JIT compiler.
A call to an external function is not the same as the call to one of your own jitted function. These are different beasts that have to be treated specially, and compiler needs a way to distinguish them. For example with Rust you mark such functions as extern "C"
.
Once it does this, it is a matter of emitting appropriate call
or jmp
or whatever instruction. Of course you need to know the address of the function (there's no other way to make a call at the native code level). This problem becomes more nuanced because of calling conventions and data layout, none of which can be detected. Or more generally ABI. One typical way to handle this problem is to simply assume that the external function has to follow the OS specific ABI. And make docs that say that otherwise the behaviour is undefined.
Note that you can't avoid ABI. You are generating a native code after all. But for internal usage (i.e. when calling your own functions and dealing with own structures) you are not restricted by anything, and in fact you can invent your own ABI. And believe me, it is not unusual. I'm pretty sure every compiler in the world has its own custom ABI for internal usage.
- The second one is a variant of the first one. How does a JIT compiler compile a function that contains a virtual call? it seems that it has to hard-code the vtable lookup procedure into the native code, but what will happen if the vtable is living in the virtual machine, then there must be a way to access the vtable inside the generated native code, but how could this be done, if the virtual machine is implemented using a language like C or C++ maybe you can hard-code the memory address of the vtable into the native code...
Yes, you bake pointers into the generated code. That would be the ultimate optimization. For any static data you want to share between virtual machine and generated code. Note, that of course this is not as simple as it sounds, you again have to be careful when dealing with data layout, calling conventions and ABI. Ultimately the virtual machine is just an external native resource for your generated code.
...but how can I do the same if the virtual machine is implemented in some managed language that its users are not encouraged to, or simply cannot access the address of an object, and even if I can access the address the object might be moved during the compact phase of the garbage collector.
In order to share managed data between two runtimes you need to again understand data layout, calling conventions, and in general ABI of your language. This is significantly harder for managed languages, if not impossible (unless you develop the language to begin with) and often those languages don't even expose or guarantee any of their own ABIs. And those ABIs change from version to version (unlike OS ABIs, which rarely change, and if they do then typically in a backwards compatible way). Moreover some languages have features that are just too hard to implement. For example Java. You can choose different garbage collector. But the code you generate, being native code, has to be aware of the garbage collector, and how can it interact with it on the native level.
So lets be honest: noone is going to write a native JIT compiler in Java, or any other sophisticated enough (in terms of runtime) language. Unless it is not supposed to interact with Java VM at all. If it is supposed to interact with it, then it is a lot easier to make the language's bytecode as the target, instead of native code.
Upvotes: 2