Are some Java bytecode instructions unnecessarily typed?

Question

As noted in https://docs.oracle.com/javase/specs/jvms/se14/html/jvms-2.html#jvms-2.11.1, encoding operand types into opcodes comes at a cost:

Given the Java Virtual Machine's one-byte opcode size, encoding types into opcodes places pressure on the design of its instruction set. If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte.

Thus, it seems one should only do this for instructions where type information of the operands is required or enables optimizations. For example, differentiation between iadd and fadd is required because addition for integers and floats is implemented differently. And I don't know exactly why there are different instructions for loading a boolean and an int from an array (baload and iaload, respectively), but I can at least imagine some performance reasons.

However, why are there different instructions for storing an int (istore) and a float (fstore) into a local variable? Shouldn't they be implemented in absolutely the same way?

This answer https://stackoverflow.com/a/2638143 says typed instructions are required for the bytecode verifier. But is this really necessary? In a method, all data flows from the method's parameters (for which the types are known) and from class fields (for which the types are also known) to other class fields and to the return value. Thus, since the types for inputs and outputs are known, can't we reconstruct any missing types for the instructions? In fact, isn't this what the bytecode verifier does anyway, since it has to check the types, i.e., it must know which types are expected?

In short: What would break if we were to combine istore and fstore into a single instruction? Would performance or portability suffer? Would bytecode verification stop working?

apangin · Accepted Answer

istore and fstore are implemented differently on pretty much every JVM and every architecture I used to work with.

For example, in HotSpot JVM x64 interpreter, istore_0 is implemented as

mov dword ptr [r14], eax

whereas fstore_0 is implemented as

movss dword ptr [r14], xmm0

The interpreter caches top-of-stack value in a register, and there are different registers for integers and floating point values.

Similarly, baload and iaload are implemented differently, as they use different offset multiplier (1 and 4 respectively), and require different machine instructions to load 8-bit value vs. 32-bit value.

As you noticed, some type information can be derived from the data flow analysis, which the bytecode verifier does anyway. But in order to use this information in runtime, the stack slots and local variables would need to be somehow tagged with the corresponding type, and the merged bytecode instructions would need to read this tag in runtime and dispatch depending on the tag. Of course, this would be of suboptimal, both in runtime performance and used memory.

Are some Java bytecode instructions unnecessarily typed?

Answers (2)

Related Questions