Xyene
Xyene

Reputation: 2364

Illegal Opcodes in the JVM

I've recently come across while developing a library that performs operations on JVM bytecode some opcodes on which there is no documentation (that I've found), yet which are recognized by the JVM reference implementation. I've found a list of these, and they are:

BREAKPOINT = 202;
LDC_QUICK = 203;
LDC_W_QUICK = 204;
LDC2_W_QUICK = 205;
GETFIELD_QUICK = 206;
PUTFIELD_QUICK = 207;
GETFIELD2_QUICK = 208;
PUTFIELD2_QUICK = 209;
GETSTATIC_QUICK = 210;
PUTSTATIC_QUICK = 211;
GETSTATIC2_QUICK = 212;
PUTSTATIC2_QUICK = 213;
INVOKEVIRTUAL_QUICK = 214;
INVOKENONVIRTUAL_QUICK = 215;
INVOKESUPER_QUICK = 216;
INVOKESTATIC_QUICK = 217;
INVOKEINTERFACE_QUICK = 218;
INVOKEVIRTUALOBJECT_QUICK = 219;
NEW_QUICK = 221;
ANEWARRAY_QUICK = 222;
MULTIANEWARRAY_QUICK = 223;
CHECKCAST_QUICK = 224;
INSTANCEOF_QUICK = 225;
INVOKEVIRTUAL_QUICK_W = 226;
GETFIELD_QUICK_W = 227;
PUTFIELD_QUICK_W = 228;
IMPDEP1 = 254;
IMPDEP2 = 255;

They seem to be replacements for their other implementations, yet have different opcodes. After a long period of trawling page after page through Google, I came across a mention of the LDC*_QUICK opcodes in this document.

Quote from it on the LDC_QUICK opcode:

Operation Push item from constant pool

Forms ldc_quick = 203 (0xcb)

Stack ... ..., item

Description The index is an unsigned byte that must be a valid index into the constant pool of the current class (§3.6). The constant pool item at index must have already been resolved and must be one word wide. The item is fetched from the constant pool and pushed onto the operand stack.

Notes The opcode of this instruction was originally ldc. The operand of the ldc instruction is not modified.

Alright. Seemed interesting, and so I decided to try it out. LDC_QUICK seems to have the same format as LDC, so I proceeded into changing a LDC opcode to a LDC_QUICK one. This resulted in a failure, though the JVM obviously recognized it. After attempting to run the modified file, the JVM crashed with the following output:

Exception in thread "main" java.lang.VerifyError: Bad instruction: cc
Exception Details:
  Location:
    Test.main([Ljava/lang/String;)V @9: fast_bgetfield
  Reason:
    Error exists in the bytecode
  Bytecode:
    0000000: bb00 0559 b700 064c 2bcc 07b6 0008 572b
    0000010: b200 09b6 000a 5710 0ab8 000b 08b8 000c
    0000020: 8860 aa00 0000 0032 0000 0001 0000 0003
    0000030: 0000 001a 0000 0022 0000 002a b200 0d12
    0000040: 0eb6 000f b200 0d12 10b6 000f b200 0d12
    0000050: 11b6 000f bb00 1259 2bb6 0013 b700 14b8
    0000060: 0015 a700 104d 2cb6 0016 b200 0d12 17b6
    0000070: 000f b1
  Exception Handler Table:
    bci [84, 98] => handler: 101
  Stackmap Table:
    append_frame(@60,Object[#41])
    same_frame(@68)
    same_frame(@76)
    same_frame(@84)
    same_locals_1_stack_item_frame(@101,Object[#42])
    same_frame(@114)

        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
        at java.lang.Class.getMethod0(Unknown Source)
        at java.lang.Class.getMethod(Unknown Source)
        at sun.launcher.LauncherHelper.validateMainClass(Unknown Source)
        at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)

The above error gives mixed messages. Obviously, class file verification failed: java.lang.VerifyError: Bad instruction: cc. At the same time, the JVM recognized the opcode: @9: fast_bgetfield. Additionally, it seems to think that it is a different instruction, because fast_bgetfield does not imply constant pushing...

I think its fair to say I am quite confused. What are these illegal opcodes? Do JVM's run them? Why am I receiving VerifyErrors? Deprecation? And do they have an advantage over their documented counterparts?

Any insight would be greatly appreciated.

Upvotes: 11

Views: 2450

Answers (3)

jdb
jdb

Reputation: 4509

The first edition of the Java virtual machine specification described a technique used by one of Sun's early implementations of the Java virtual machine to speed up the interpretation of bytecodes. In this scheme, opcodes that refer to constant pool entries are replaced by a "_quick" opcode when the constant pool entry is resolved. When the virtual machine encounters a _quick instruction, it knows the constant pool entry is already resolved and can therefore execute the instruction faster.

The core instruction set of the Java virtual machine consists of 200 single-byte opcodes. These 200 opcodes are the only opcodes you will ever see in class files. Virtual machine implementations that use the "_quick" technique use another 25 single-byte opcodes internally, the "_quick" opcodes.

For example, when a virtual machine that uses the _quick technique resolves a constant pool entry referred to by an ldc instruction (opcode value 0x12), it replaces the ldc opcode byte in the bytecode stream with an ldc_quick instruction (opcode value 0xcb). This technique is part of the process of replacing a symbolic reference with a direct reference in Sun's early virtual machine.

For some instructions, in addition to overwriting the normal opcode with a _quick opcode, a virtual machine that uses the _quick technique overwrites the operands of the instruction with data that represents the direct reference. For example, in addition to replacing an invokevirtual opcode with an invokevirtual_quick, the virtual machine also puts the method table offset and the number of arguments into the two operand bytes that follow every invokevirtual instruction. Placing the method table offset in the bytecode stream following the invokevirtual_quick opcode saves the virtual machine the time it would take to look up the offset in the resolved constant pool entry.

Chapter 8 of Inside the Java Virtual Machine

Basically you cannot just put the opcode in the class file. Only the JVM can do that after it resolves the operands.

Upvotes: 12

Antimony
Antimony

Reputation: 39451

These opcodes are reserved and cannot appear in a valid class file, hence the VerifyError. However, the JVM uses them internally. Therefore, the in memory representation of some bytecode might contain these opcodes after modification by the VM. However, this is purely an implementation detail.

Upvotes: 3

Ted Hopp
Ted Hopp

Reputation: 234795

I don't know about all of the opcodes you have listed, but three of them—breakpoint, impdep1, and impdep2—are reserved opcodes documented in Section 6.2 of the Java Virtual Machine Specification. It says, in part:

Two of the reserved opcodes, numbers 254 (0xfe) and 255 (0xff), have the mnemonics impdep1 and impdep2, respectively. These instructions are intended to provide "back doors" or traps to implementation-specific functionality implemented in software and hardware, respectively. The third reserved opcode, number 202 (0xca), has the mnemonic breakpoint and is intended to be used by debuggers to implement breakpoints.

Although these opcodes have been reserved, they may be used only inside a Java virtual machine implementation. They cannot appear in valid class files. . . .

I suspect (from their names) that the rest of the other opcodes are part of the JIT mechanism and also cannot appear in a valid class file.

Upvotes: 4

Related Questions