LuCio
LuCio

Reputation: 5193

Is String a numeric type regarding switch and always compiled to lookupswitch?

The following code returns whether a given String s is equal to any other of the hard-coded strings. The method uses the switch-statement to do so:

public class SwitchOnString {
    public static boolean equalsAny(String s) {
        switch (s) {
        case "string 1":
            return true;
        case "string 2":
            return true;
        default:
            return false;
        }
    }
}

According to the Java Virtual Machine Specification (JMS) 3.10 Compiling Switches"

Compilation of switch statements uses the tableswitch and lookupswitch instructions.

Moreover

tableswitch and lookupswitch instructions operate only on int data.

I read chapter 3.10 but didn't find anywhere String mentioned.

The only one sentence which comes indirectly close is:

Other numeric types must be narrowed to type int for use in a switch.

Question 1:
Is String in this context also a numeric type? Or did I missed something?

A javap -c on the class SwitchOnString shows:

Compiled from "SwitchOnString.java"
public class playground.SwitchOnString {
  public playground.SwitchOnString();
   ...

  public static boolean equalsAny(java.lang.String);
    Code:
       0: aload_0
       1: dup
       2: astore_1
       3: invokevirtual #16                 // Method java/lang/String.hashCode:()I
       6: lookupswitch  { // 2
            1117855161: 32
            1117855162: 44
               default: 60
          }
   ...

}

Obviously the hashCode values are used as int-keys of the cases. This probably matches:

The lookupswitch instruction pairs int keys (the values of the case labels) ...

Going on to tableswitch and lookupswitch the JMS says:

The tableswitch instruction is used when the cases of the switch can be efficiently represented as indices into a table of target offsets. (...) Where the cases of the switch are sparse, the table representation of the tableswitch instruction becomes inefficient in terms of space. The lookupswitch instruction may be used instead.

If I get this right then the more the cases become sparse the more probably the lookupswitch will be used.

Question 2:
But looking at the bytecode:
Are two string cases sparse enough to compile switch to lookupswitch? Or will every switch on String be compiled to lookupswitch?

Upvotes: 4

Views: 254

Answers (1)

Holger
Holger

Reputation: 298459

The specification doesn’t tell how to compile switch statements, that’s up to the compiler.

In that regard, the JVMS statement, “Other numeric types must be narrowed to type int for use in a switch” does not say that the Java programming language will do such a conversion nor that String or Enum are numeric types. I.e. long, float and double are numeric types, but there’s no support for using them with switch statements in the Java programming language.

So the language specification says that switch over String is supported, hence, the compilers must find a way to compile them to bytecode. Using an invariant property like the hash code is a common solution, but in principle, other properties like the length or an arbitrary character could be used as well.

As discussed in “Why switch on String compiles into two switches” and “Java 7 String switch decompiled: unexpected instruction”, javac currently generates two switch instructions on the bytecode level when compiling switch over String values (ECJ also generates two instructions, but details may differ).

Then, the compiler has to pick either, a lookupswitch or tableswitch instruction. javac does use tableswitch when the numbers are not sparse, but only if the statement has more than two case labels.

So when I compile the following method:

public static char two(String s) {
    switch(s) {
        case "a": return 'a';
        case "b": return 'b';
    }
    return 0;
}

I get

public static char two(java.lang.String);
Code:
   0: aload_0
   1: astore_1
   2: iconst_m1
   3: istore_2
   4: aload_1
   5: invokevirtual #9                  // Method java/lang/String.hashCode:()I
   8: lookupswitch  { // 2
                97: 36
                98: 50
           default: 61
      }
  36: aload_1
  37: ldc           #10                 // String a
  39: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  42: ifeq          61
  45: iconst_0
  46: istore_2
  47: goto          61
  50: aload_1
  51: ldc           #12                 // String b
  53: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  56: ifeq          61
  59: iconst_1
  60: istore_2
  61: iload_2
  62: lookupswitch  { // 2
                 0: 88
                 1: 91
           default: 94
      }
  88: bipush        97
  90: ireturn
  91: bipush        98
  93: ireturn
  94: iconst_0
  95: ireturn

but when I compile,

public static char three(String s) {
    switch(s) {
        case "a": return 'a';
        case "b": return 'b';
        case "c": return 'c';
    }
    return 0;
}

I get

public static char three(java.lang.String);
Code:
   0: aload_0
   1: astore_1
   2: iconst_m1
   3: istore_2
   4: aload_1
   5: invokevirtual #9                  // Method java/lang/String.hashCode:()I
   8: tableswitch   { // 97 to 99
                97: 36
                98: 50
                99: 64
           default: 75
      }
  36: aload_1
  37: ldc           #10                 // String a
  39: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  42: ifeq          75
  45: iconst_0
  46: istore_2
  47: goto          75
  50: aload_1
  51: ldc           #12                 // String b
  53: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  56: ifeq          75
  59: iconst_1
  60: istore_2
  61: goto          75
  64: aload_1
  65: ldc           #13                 // String c
  67: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  70: ifeq          75
  73: iconst_2
  74: istore_2
  75: iload_2
  76: tableswitch   { // 0 to 2
                 0: 104
                 1: 107
                 2: 110
           default: 113
      }
 104: bipush        97
 106: ireturn
 107: bipush        98
 109: ireturn
 110: bipush        99
 112: ireturn
 113: iconst_0
 114: ireturn

It’s not clear why javac makes this choice. While tableswitch has a higher base footprint (one additional 32 bit word) compared to lookupswitch, it would still be shorter in bytecode, even for the two case labels scenario.

But the consistency of the decision can be shown with the second statement, which will always have the same value range, but compiles to lookupswitch or tableswitch depending only on the number of labels. So when using truly sparse values:

public static char three(String s) {
    switch(s) {
        case "a": return 'a';
        case "b": return 'b';
        case "": return 0;
    }
    return 0;
}

it compiles to

public static char three(java.lang.String);
Code:
   0: aload_0
   1: astore_1
   2: iconst_m1
   3: istore_2
   4: aload_1
   5: invokevirtual #9                  // Method java/lang/String.hashCode:()I
   8: lookupswitch  { // 3
                 0: 72
                97: 44
                98: 58
           default: 83
      }
  44: aload_1
  45: ldc           #10                 // String a
  47: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  50: ifeq          83
  53: iconst_0
  54: istore_2
  55: goto          83
  58: aload_1
  59: ldc           #12                 // String b
  61: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  64: ifeq          83
  67: iconst_1
  68: istore_2
  69: goto          83
  72: aload_1
  73: ldc           #13                 // String
  75: invokevirtual #11                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
  78: ifeq          83
  81: iconst_2
  82: istore_2
  83: iload_2
  84: tableswitch   { // 0 to 2
                 0: 112
                 1: 115
                 2: 118
           default: 120
      }
 112: bipush        97
 114: ireturn
 115: bipush        98
 117: ireturn
 118: iconst_0
 119: ireturn
 120: iconst_0
 121: ireturn

using lookupswitch for the sparse hash codes, but tableswitch for the second switch.

Upvotes: 3

Related Questions