DaviD.
DaviD.

Reputation: 475

Are some Java bytecode instructions unnecessarily typed?

As noted in https://docs.oracle.com/javase/specs/jvms/se14/html/jvms-2.html#jvms-2.11.1, encoding operand types into opcodes comes at a cost:

Given the Java Virtual Machine's one-byte opcode size, encoding types into opcodes places pressure on the design of its instruction set. If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte.

Thus, it seems one should only do this for instructions where type information of the operands is required or enables optimizations. For example, differentiation between iadd and fadd is required because addition for integers and floats is implemented differently. And I don't know exactly why there are different instructions for loading a boolean and an int from an array (baload and iaload, respectively), but I can at least imagine some performance reasons.

However, why are there different instructions for storing an int (istore) and a float (fstore) into a local variable? Shouldn't they be implemented in absolutely the same way?

This answer https://stackoverflow.com/a/2638143 says typed instructions are required for the bytecode verifier. But is this really necessary? In a method, all data flows from the method's parameters (for which the types are known) and from class fields (for which the types are also known) to other class fields and to the return value. Thus, since the types for inputs and outputs are known, can't we reconstruct any missing types for the instructions? In fact, isn't this what the bytecode verifier does anyway, since it has to check the types, i.e., it must know which types are expected?

In short: What would break if we were to combine istore and fstore into a single instruction? Would performance or portability suffer? Would bytecode verification stop working?

Upvotes: 2

Views: 308

Answers (2)

apangin
apangin

Reputation: 98284

istore and fstore are implemented differently on pretty much every JVM and every architecture I used to work with.

For example, in HotSpot JVM x64 interpreter, istore_0 is implemented as

mov dword ptr [r14], eax

whereas fstore_0 is implemented as

movss dword ptr [r14], xmm0

The interpreter caches top-of-stack value in a register, and there are different registers for integers and floating point values.

Similarly, baload and iaload are implemented differently, as they use different offset multiplier (1 and 4 respectively), and require different machine instructions to load 8-bit value vs. 32-bit value.

As you noticed, some type information can be derived from the data flow analysis, which the bytecode verifier does anyway. But in order to use this information in runtime, the stack slots and local variables would need to be somehow tagged with the corresponding type, and the merged bytecode instructions would need to read this tag in runtime and dispatch depending on the tag. Of course, this would be of suboptimal, both in runtime performance and used memory.

Upvotes: 7

Antimony
Antimony

Reputation: 39451

I think you are correct about loads and stores not needing to be typed. Not being one of the original designers of Java myself, I can only speculate about why things were designed that way. But here's my guess.

I think that when Java was first designed, it was designed with interpretation in mind, and it's possible that they thought that loads and stores would need to be implemented differently for ints and floats. It's also possible they wanted to be able to run without verification (in fact, it is still possible to disable bytecode verification today). Lastly, while it is technically possible to still do the same bytecode verification when load and store instructions are untyped, it would make things slightly more complicated.

A simple proof that verification could be done with untyped loads and stores is that the local variable table behaves similarly to the operand stack, and there are untyped instructions that operate on the operand stack (dup, swap, etc.).

Upvotes: 2

Related Questions