Continuation
Continuation

Reputation: 13040

Why does JIT'ed code consume so much more memory than either compiled or interpreted code?

Compiled code such as C consumes little memory.

Interpreted code such as Python consumes more memory, which is understandable.

With JIT, a program is (selectively) compiled into machine code at run time. So shouldn't the memory consumption of a JIT'ed program be somewhere between that of a compiled and an interpreted program?

Instead a JIT'ed program (such as PyPy) consume several times more memory than the equivalent interpreted program (such as Python). Why?

Upvotes: 8

Views: 2892

Answers (2)

Ben
Ben

Reputation: 71495

Be careful about what kind of memory usage you're talking about.

Code compiled to C uses comparatively little memory for the compiled machine code itself.

I would expect Python bytecode for a given algorithm to actually be smaller than the compiled C code for a similar algorithm, because Python bytecode operations are much higher level so there's often fewer of them to get a given thing done. But a Python program will also have the compiled code of the Python interpreter in memory, which is quite a large and complex program in itself. Plus a typical Python program will have much more of the standard library in memory than a typical C program (and a C program can strip out all the functions it doesn't actually use if it's statically linked, and if it's dynamically linked then it shares the compiled code with any other process in memory that uses it).

PyPy then has on top of this the machine code of the JIT compiler, as well as the machine code generated from the Python bytecode (which doesn't go away, it has to be kept around as well). So your intuition (that a JITed system "should" consume memory somewhere between that of a compiled language and a fully interpreted language) isn't correct anyway.

But on top of all of those you've got the actual memory used by the data structures the program operates on. This varies immensely, and has little to do with whether the program is compiled ahead of time, or interpreted, or interpreted-and-JITed. Some compiler optimisations will reduce memory usage (whether they're applied ahead of time or just in time), but many actually trade off memory usage to gain speed. For programs that manipulate any serious amount of data it will completely dwarf the memory used by the code itself, anyway.

When you say:

Instead a JIT'ed program (such as PyPy) consume several times more memory than the equivalent interpreted program (such as Python). Why?

What programs are you thinking of? If you've actually done any comparisons, I'm guessing from your question that they would be between PyPy and CPython. I know many of PyPy's data structures are actually smaller than CPython's, but again, that has nothing to do with the JIT.

If the dominant memory usage of a program is the code itself, then a JIT compiler adds huge memory overhead (for the compiler itself, and the compiled code), and can't do very much at all to "win back" memory usage through optimisation. If the dominant memory usage is program data structures, then I wouldn't be at all surprised to find PyPy using significantly less memory than CPython, whether or not the JIT was enabled.


There's not really a straightforward answer to your "Why?" because the statements in your question are not straightforwardly true. Which system uses more memory depends on many factors; the presence or absence of a JIT compiler is one factor, but it isn't always significant.

Upvotes: 5

Necrolis
Necrolis

Reputation: 26171

Tracing JIT compilers take quite a bit more memory due to the fact that they need to keep not only the bytecode for the VM, but also the directly executable machine code as well. this is only half the story however.

Most JIT's will also keep a lot of meta data about the bytecode (and even the machine code) to allow them to determine what needs to be JIT'ed and what can be left alone. Tracing JIT's (such as LuaJIT) also create trace snapshots which are used to fine tune code at run time, performing things like loop unrolling or branch reordering.

Some also keep caches of commonly used code segments or fast lookup buffers to speed up creation of JIT'ed code (LuaJIT does this via DynAsm, it can actually help reduce memory usage when done correctly, as is the case with dynasm).

The memory usage greatly depends on the JIT engine employed and the nature of the language it compiles (strongly vs weakly-typed). some JIT's employ advanced techniques such as SSA based register allocators and variable livelyness analysis, these sort of optimizations helps consume memory as well, along with the more common things like loop variable hoisting.

Upvotes: 8

Related Questions