Reputation: 511

How exactly does the Java interpreter or any interpreter work?

I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter.

So what i have understood is:

An interpreter is a software program that converts code from high level language to machine format.
speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).
now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
JVM in turns executes that code on the OS platform in which JVM is implemented or being run.

Now i am still not clear with the sub process that happens in between i.e.

interpreter produces intermediate code.
interpreted code is then optimized.
then target code is generated
and finally executed.

Some more questions:

so is the interpreter alone responsible for generating target code ? and executing it ?
and does executing means it gets executed in JVM or in the underlying OS ?

Upvotes: 13

Answers (5)

Gray

Reputation: 116878

An interpreter is a software program that converts code from high level language to machine format.

No. That's a compiler. An interpreter is a computer program that executes the instructions written in a language directly. This is different from a compiler that converts a higher level language into a lower language. The C compiler goes from C to assembly code with the assembler (another type of compiler) translates from assembly to machine code -- modern C compilers do both steps to go from C to machine code.

In Java, the java compiler does code verification and converts from Java source to byte-code class files. It also does a number of small processing tasks such as pre-calculation of constants (if possible), caching of strings, etc..

now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.

The JVM operates on the bytecode directly. The java interpreter is integrated so closely with the JVM that they shouldn't really be thought of as separate entities. What also is happening is a crap-ton of optimization where bytecode is basically optimized on the fly. This makes calling it just an interpreter inadequate. See below.

so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.

The JVM is doing these translations.

JVM in turns executes that code on the OS platform in which JVM is implemented or being run.

I'd rather say that the JVM uses the bytecode, optimized user code, the java libraries which include java and native code, in conjunction with OS calls to execute java applications.

now i am still not clear with the sub process that happens in between i.e. 1. interpreter produces intermediate code. 2. interpreted code is then optimized. 3. then target code is generated 4. and finally executed.

The Java compiler generates bytecode. When the JVM executes the code, steps 2-4 happen at runtime inside of the JVM. It is very different than C (for example) which has these separate steps being run by different utilities. Don't think about this as "subprocesses", think about it as modules inside of the JVM.

so is the interpreter alone responsible for generating target code ? and executing it ?

Sort of. The JVM's interpreter by definition reads the bytecode and executes it directly. However, in modern JVMs, the interpreter works in tandem with the Just-In-Time compiler (JIT) to generate native code on the fly so that the JVM can have your code execute more efficiently.

In addition, there are post-processing "compilation" stages which analyze the generated code at runtime so that native code can be optimized by inlining often-used code blocks and through other mechanisms. This is the reason why the JVM load spikes so high on startup. Not only is it loading in the jars and class files, but it is in effect doing a cc -O3 on the fly.

and does executing means it gets executed in JVM or in the underlying OS ?

Although we talk about the JVM executing the code, this is not technically correct. As soon as the byte-code is translated into native code, the execution of the JVM and your java application is done by the CPU and the rest of the hardware architecture.

The Operating System is the substrate that that does all of the process and resource management so the programs can efficiently share the hardware and execute efficiently. The OS also provides the APIs for applications to easily access the disk, network, memory, and other hardware and resources.

Upvotes: 14

Joop Eggen

Reputation: 109547

There are two ways of executing a program.

By way of a compiler: this parses a text in the programming language (say .c) to machine code, on Windows .exe. This can then be executed independent of the compiler.

This compilation can be done by compiling several .c files to several object files (intermediate products), and then linking them into a single application or library.

By way of an interpreter: this parses a text in the programming language (say .java) and "immediately" executes the program.

With java the approach is a bit hybrid/stacked: the java compiler javac compiles .java to .class files, and possibles zips those in .jar (or .war, .ear ...). The .class files consist of a more abstract byte code, for an abstract stack machine.

Then the java runtime java (call JVM, java virtual machine, or byte code interpreter) can execute a .class/.jar. This is in fact an interpreter of java byte code.

Nowadays it also translates (parts) of the byte code at run time to machine code. This is also called a just-in-time compiler for byte code to machine code.

In short: - a compiler just creates code; - an interpreter immediately executes.

An interpreter will loop over parsed commands / a high level intermediate code, and interprete every command with a piece of code. Indirect an in principle slow.

Upvotes: 1

Danilo M. Oliveira

Reputation: 728

I'll answer based on my experience on creating a DSL.

C is compiled because you run pass the source code to the gcc and runs the stored program in machine code.

Python is interpreted because you run programs by passing the program source to the interpreter. The interpreter reads the source file and executes it.

Java is a mix of both because you "compile" the Java file to bytecode, then invokes the JVM to run it. Bytecode isn't machine code, it needs to be interpreted by the JVM. Java is in a tier between C and Python because you cannot do fancy things like a "eval" (evaluating chunks of code or expressions at runtime as in Python). However, Java has reflection abilities that are impossible to a C program to have. In short, the design of Java runtime being in a intermediary tier between a pure compiled and a interpreted language gives the best (and the worst) of the two words in terms of performance and flexibility.

However, Python also has a virtual machine and it's own bytecode format. The same applies to Perl, Lua, etc. Those interpreters first converts a source file to a bytecode, then they interpret the bytecode.

I always wondered why doing this is necessary, until I made my own interpreter for a simulation DSL. My interpreter does a lexical analysis (break a source in tokens), converts it to a abstract syntax tree, then it evaluates the tree by traversing it. For software engineering sake I'm using some design patterns and my code heavily uses polymorphism. This is very slow in comparison to processing a efficient bytecode format that mimics a real computer architecture. My simulations would be way faster if I create my own virtual machine or use a existent one. For evaluating a long numeric expression, for instance, it'll be faster to translate it to something similar to assembly code than processing a branch of a abstract tree, since it requires calling a lot of polymorphic methods.

Upvotes: 3

Preston Garno

Reputation: 1215

Giving a 1000 foot view which will hopefully clear things up:

There are 2 main steps to a java application: compilation, and runtime. Each process has very different functions and purposes. The main processes for both are outlined below:

Compilation

This is (normally) executed by [com.sun.tools.javac][1] usually found in the tools.jar file, traditionally in your $JAVA_HOME - the same place as java.jar, etc.
The goal here is to translate .java source files into .class files which contain the "recipe" for the java runtime environment.

Compilation steps:

Parsing: the files are read, and stripped of their 'boundary' syntax characters, such as curly braces, semicolons, and parentheses. These exists to tell the parser which java object to translate each source component into (more about this in the next point).
AST creation: The Abstract Syntax Tree is how a source file is represented. This is a literal "tree" data structure, and the root class for this is [com.sun.tools.JCTree][3]. The overall idea is that there is a java object for each Expression and each Statement. At this point in time relatively little is known about actual "types" that each represent. The only thing that is checked for at the creation of the AST is literal syntax
Desugar: This is where for loops and other syntactical sugar are translated into simpler form. The language is still in 'tree' form and not bytecode so this can easily happen
Type checking/Inference: Where the compiler gets complex. Java is a static language, so the compiler has to go over the AST using the Visitor Pattern and figure out the types of everything ahead of tim and makes sure that at runtime everything (well, almost) will be legal as far as types, method signatures, etc. goes. If something is too vague or invalid, compilation fails.
Bytecode: Control flow is checked to make sure that the program execution logic is valid (no unreachable statements, etc.) If everything passes the checks without errors, then the AST is translated into the bytecodes that the program represents.
.class file writing: at this point, the class files are written. Essentially, the bytecode is a small layer of abstraction on top of specialized machine code. This makes it possible to port to other machines/CPU structures/platforms without having to worry about the relatively small differences between them.

Runtime

There is a different Runtime Environment/Virtual Machine implementation for each computer platform. The Java APIs are universal, but the runtime environment is an entirely separate piece of software.
JRE only knows how to translate bytecode from the class files into machine code compatible with the target platform, and that is also highly optimized for the respective platform.
There are many different runtime/vm implementations, but the most popular one is the Hotspot VM.
The VM is incredibly complex and optimizes your code at runtime. Startup times are slow but it essentially "learns" as it goes.
This is the 'JIT' (Just-in-time) concept in action - the compiler did all of the heavy lifting by checking for correct types and syntax, and the VM simply translates and optimizes the bytecode to machine code as it goes.

Also...

The Java compiler API was standardized under JSR 199. While not exactly falling under same thing (can't find the exact JLS), many other languages and tools leverage the standardized compilation process/API in order to use the advanced JVM (runtime) technology that Oracle provides, while allowing for different syntax.
- See Scala, Groovy, Kotlin, Jython, JRuby, etc. All of these leverage the Java Runtime Environment because they translate their different syntax to be compatible with the Java compiler API! It's pretty neat - anyone can write a high-performance language with whatever syntax they want because of the decoupling of the two. There's adaptations for almost every single language for the JVM

Upvotes: 8

Stephen C

Reputation: 718798

1) An interpreter is a software program that converts code from high level language to machine format.

Incorrect. An interpreter is a program that runs a program expressed in some language that is NOT the computer's native machine code.

There may be a step in this process in which the source language is parsed and translated to an intermediate language, but this is not a fundamental requirement for an interpreter. In the Java case, the bytecode language has been designed so that neither parsing or a distinct intermediate language are necessary.

2) speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).

Correct. The "binary format" is Java bytecodes.

3) now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.

Incorrect. The bytecode interpreter is part of the JVM. The interpreter doesn't run on the JVM. And the bytecode interpreter doesn't produce anything. It just runs the bytecodes.

4) so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.

Incorrect.

5) JVM in turns executes that code on the OS platform in which JVM is implemented or being run.

Incorrect.

The real story is this:

The JVM has a number of components to it.
One component is the bytecode interpreter. It executes bytecodes pretty much directly¹. You can think of the interpreter as an emulator for an abstract computer whose instruction set is bytecodes.
A second component is the JIT compiler. This translates bytecodes into the target machine's native machine code so that it can be executed by the target hardware.

^{1 - A typical bytecode interpreter does some work to map abstract stack frames and object layouts to concrete ones involving target-specific sizes and offsets. But to call this an "intermediate code" is a stretch. The interpreter is really just enhancing the bytecodes.}

Upvotes: 9