Eric B.
Eric B.

Reputation: 24411

Any way to regenerate stackmap from byte code?

I have an old library (circa 2005) that performs byte code manipulation, but does not touch the stackmap. Consequently my jvm (java 8) complains that they are invalid classes. Only way to circumvent the errors is to run the jvm with -noverify. But that is not a long term solution for me.

Is there someway I can regenerate the stack map after the classes have already been generated? I saw the ClassWriter class had an option to regenerate the stack map, but I'm not sure how to read in a byte class and rewrite a new one. Is that feasible?

Upvotes: 3

Views: 794

Answers (2)

Holger
Holger

Reputation: 298233

When you instrument old classes not having stackmaps and keep their old version number, there will be no problem, as they will be processed by the JVM the same way as before, not requiring stackmaps. Of course, this implies that you can’t inject newer bytecode features.

When you are instrumenting newer class files which had valid stackmaps before the transformation, you will not be running into those problems described by Antimony. So you can use ASM to regenerate stackmaps:

byte[] bytecode = … // result of your instrumentation
ClassReader cr = new ClassReader(bytecode);
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
cr.accept(cw, ClassReader.SKIP_FRAMES);
bytecode = cw.toByteArray(); // with recalculated stack maps

The visitor API has been designed to allow easy chaining of a reader with a writer and only add code to intercept those artifacts you want to change.

Note that since we know that we are going to regenerate the stackmap frames from scratch using ClassWriter.COMPUTE_FRAMES, we can pass ClassReader.SKIP_FRAMES to the reader to tell it not to process the source frames we’re going to ignore anyway.

There is another optimization possible when we know that the class structure doesn’t change. We can pass the ClassReader to the ClassWriter’s constructor to draw a benefit from the unchanged structure, e.g. the target constant pool will get initialized with a copy of the source constant pool. This option, however, must be handled with care. If we don’t intercept methods at all, it will get optimized too, i.e. the code gets copied entirely without even recalculating the stack frames. So we need a custom method visitor to pretend that the code could potentially change:

byte[] bytecode = … // result of your instrumentation
ClassReader cr = new ClassReader(bytecode);
// passing cr to ClassWriter to enable optimizations
ClassWriter cw = new ClassWriter(cr, ClassWriter.COMPUTE_FRAMES);
cr.accept(new ClassVisitor(Opcodes.ASM5, cw) {
    @Override
    public MethodVisitor visitMethod(int access, String name, String desc,
                                     String signature, String[] exceptions) {
        MethodVisitor writer=super.visitMethod(access, name, desc, signature, exceptions);
        return new MethodVisitor(Opcodes.ASM5, writer) {
            // not changing anything, just preventing code specific optimizations
        };
    }
}, ClassReader.SKIP_FRAMES);
bytecode = cw.toByteArray(); // with recalculated stack maps

This way, unchanged artifacts like the constant pool can be copied directly to the target byte code while the stackmap frames still get recalculated.

There are some caveats, though. Generating stackmaps from scratch implies not utilizing any knowledge about the original code structure or the nature of the transformation. E.g. a compiler would know the formal types of local variable declarations whereas the ClassWriter may see different actual types for which it has to find the common base type. This search may be very expensive, cause the loading of classes which were deferred or not even be used during normal execution. The resulting type may even differ from the common type declared in the original code. It will be a correct type, but may again change the use of classes in the resulting code.

If you are performing the instrumentation in a different environment, ASM’s attempts to load the classes for determining the common type may fail. Then, you will have to override ClassWriter.getCommonSuperClass(…) with an implementation which can perform the operation in that environment. This is also the place to add optimizations, if you have more knowledge about the code and can provide answers without expensive searches through the type hierarchy.

Generally, it’s recommended to refactor that old library to use ASM in the first place instead of needing a subsequent adaption step. As explained above, when performing the code transformation using a chain of ClassReader and ClassWriter with optimizations enabled, ASM would be able to copy all unchanged methods, including their stackmaps, and only recalculate the stackmaps of actually changed methods. In the code above, doing the recalculation in a subsequent step, we had to disable the optimization as we don’t know anymore which methods were actually changed.

The next logical step would be to incorporate stackmap handling into the instrumentation, as more than often the knowledge about he actual transformation allows to keep 99% of the existing frames and easily adapt the others, instead of needing an expensive recalculation from scratch.

Upvotes: 6

Antimony
Antimony

Reputation: 39451

As far as how to read in the class, you should be able to just use a ClassReader.

As for the more general question about the feasibility of automatically adding stack maps to old classes, in most cases, it possible. However, there are a few obscure cases where this would not be possible, mostly due to the fact that the inference verifier is laxer than the stackmap verifier. Note that these only apply to the case of adding a stack map to old code that never had one. If you are modifying existing Java 8 code, you can ignore all this.

First off of are the jsr and ret instructions, which are only allowed in classfiles version <= 49 (corresponding to Java 5). If you want to port code using them, you would have to rewrite the code to duplicate and inline all the subroutine bodies.

Apart from that, there are more minor issues. For example, the inference verifier allows you to freely mix boolean and byte arrays (they are considered the same type by the verifier), but the stackmap verifier treats them as distinct types.

Another potential issue is that with inference verification, dead code is never checked at all, while the stackmap verifier still requires you to specify stack maps for everything. In this case, the fix is easy - delete all the dead code.

Lastly, there is the issue that stackmaps require you to specify the common superclasses of types upfront when they merge in the control flow, whereas with inference verification, you don't need to explicitly specify supertypes. Most of the time, this won't matter, since you have a known inheritance hierarchy, but it is theoretically possible to inherit from classes that are only defined at runtime via a ClassLoader.

And of course, the stackmaps require corresponding entries in the constant pool, which means that you have less space in the constant pool for everything else. If you have a class that is close to hitting the maximum constant pool size, then adding a stack map may not be possible. This is very rare, but may happen with autogenerated code.

P.S. There is also the possibility of going in the other direction. If your code doesn't use any version 51.0 or 52.0 specific features (which is basically just invokedynamic, aka lambdas), then you can set the classfile version to 50.0, removing the need for a stack map. Of course, this is kind of a backwards solution, and will become increasingly hard as future classfile versions add more attractive features (such as lambdas).

Upvotes: 2

Related Questions