Reputation: 5193
The following code returns whether a given String s
is equal to any other of the hard-coded strings. The method uses the switch
-statement to do so:
public class SwitchOnString {
public static boolean equalsAny(String s) {
switch (s) {
case "string 1":
return true;
case "string 2":
return true;
default:
return false;
}
}
}
According to the Java Virtual Machine Specification (JMS) 3.10 Compiling Switches"
Compilation of switch statements uses the tableswitch and lookupswitch instructions.
Moreover
tableswitch and lookupswitch instructions operate only on
int
data.
I read chapter 3.10 but didn't find anywhere String
mentioned.
The only one sentence which comes indirectly close is:
Other numeric types must be narrowed to type int for use in a switch.
Question 1:
Is String
in this context also a numeric type? Or did I missed something?
A javap -c
on the class SwitchOnString
shows:
Compiled from "SwitchOnString.java"
public class playground.SwitchOnString {
public playground.SwitchOnString();
...
public static boolean equalsAny(java.lang.String);
Code:
0: aload_0
1: dup
2: astore_1
3: invokevirtual #16 // Method java/lang/String.hashCode:()I
6: lookupswitch { // 2
1117855161: 32
1117855162: 44
default: 60
}
...
}
Obviously the hashCode
values are used as int
-keys of the case
s. This probably matches:
The lookupswitch instruction pairs
int
keys (the values of the case labels) ...
Going on to tableswitch and lookupswitch the JMS says:
The tableswitch instruction is used when the cases of the switch can be efficiently represented as indices into a table of target offsets. (...) Where the cases of the switch are sparse, the table representation of the tableswitch instruction becomes inefficient in terms of space. The lookupswitch instruction may be used instead.
If I get this right then the more the cases become sparse the more probably the lookupswitch will be used.
Question 2:
But looking at the bytecode:
Are two string cases sparse enough to compile switch
to lookupswitch? Or will every switch on String
be compiled to lookupswitch?
Upvotes: 4
Views: 254
Reputation: 298459
The specification doesn’t tell how to compile switch
statements, that’s up to the compiler.
In that regard, the JVMS statement, “Other numeric types must be narrowed to type int
for use in a switch
” does not say that the Java programming language will do such a conversion nor that String
or Enum
are numeric types. I.e. long
, float
and double
are numeric types, but there’s no support for using them with switch
statements in the Java programming language.
So the language specification says that switch
over String
is supported, hence, the compilers must find a way to compile them to bytecode. Using an invariant property like the hash code is a common solution, but in principle, other properties like the length or an arbitrary character could be used as well.
As discussed in “Why switch on String compiles into two switches” and “Java 7 String switch decompiled: unexpected instruction”, javac
currently generates two switch instructions on the bytecode level when compiling switch
over String
values (ECJ also generates two instructions, but details may differ).
Then, the compiler has to pick either, a lookupswitch
or tableswitch
instruction. javac
does use tableswitch
when the numbers are not sparse, but only if the statement has more than two case labels.
So when I compile the following method:
public static char two(String s) {
switch(s) {
case "a": return 'a';
case "b": return 'b';
}
return 0;
}
I get
public static char two(java.lang.String);
Code:
0: aload_0
1: astore_1
2: iconst_m1
3: istore_2
4: aload_1
5: invokevirtual #9 // Method java/lang/String.hashCode:()I
8: lookupswitch { // 2
97: 36
98: 50
default: 61
}
36: aload_1
37: ldc #10 // String a
39: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
42: ifeq 61
45: iconst_0
46: istore_2
47: goto 61
50: aload_1
51: ldc #12 // String b
53: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
56: ifeq 61
59: iconst_1
60: istore_2
61: iload_2
62: lookupswitch { // 2
0: 88
1: 91
default: 94
}
88: bipush 97
90: ireturn
91: bipush 98
93: ireturn
94: iconst_0
95: ireturn
but when I compile,
public static char three(String s) {
switch(s) {
case "a": return 'a';
case "b": return 'b';
case "c": return 'c';
}
return 0;
}
I get
public static char three(java.lang.String);
Code:
0: aload_0
1: astore_1
2: iconst_m1
3: istore_2
4: aload_1
5: invokevirtual #9 // Method java/lang/String.hashCode:()I
8: tableswitch { // 97 to 99
97: 36
98: 50
99: 64
default: 75
}
36: aload_1
37: ldc #10 // String a
39: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
42: ifeq 75
45: iconst_0
46: istore_2
47: goto 75
50: aload_1
51: ldc #12 // String b
53: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
56: ifeq 75
59: iconst_1
60: istore_2
61: goto 75
64: aload_1
65: ldc #13 // String c
67: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
70: ifeq 75
73: iconst_2
74: istore_2
75: iload_2
76: tableswitch { // 0 to 2
0: 104
1: 107
2: 110
default: 113
}
104: bipush 97
106: ireturn
107: bipush 98
109: ireturn
110: bipush 99
112: ireturn
113: iconst_0
114: ireturn
It’s not clear why javac
makes this choice. While tableswitch
has a higher base footprint (one additional 32 bit word) compared to lookupswitch
, it would still be shorter in bytecode, even for the two case
labels scenario.
But the consistency of the decision can be shown with the second statement, which will always have the same value range, but compiles to lookupswitch
or tableswitch
depending only on the number of labels. So when using truly sparse values:
public static char three(String s) {
switch(s) {
case "a": return 'a';
case "b": return 'b';
case "": return 0;
}
return 0;
}
it compiles to
public static char three(java.lang.String);
Code:
0: aload_0
1: astore_1
2: iconst_m1
3: istore_2
4: aload_1
5: invokevirtual #9 // Method java/lang/String.hashCode:()I
8: lookupswitch { // 3
0: 72
97: 44
98: 58
default: 83
}
44: aload_1
45: ldc #10 // String a
47: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
50: ifeq 83
53: iconst_0
54: istore_2
55: goto 83
58: aload_1
59: ldc #12 // String b
61: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
64: ifeq 83
67: iconst_1
68: istore_2
69: goto 83
72: aload_1
73: ldc #13 // String
75: invokevirtual #11 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
78: ifeq 83
81: iconst_2
82: istore_2
83: iload_2
84: tableswitch { // 0 to 2
0: 112
1: 115
2: 118
default: 120
}
112: bipush 97
114: ireturn
115: bipush 98
117: ireturn
118: iconst_0
119: ireturn
120: iconst_0
121: ireturn
using lookupswitch
for the sparse hash codes, but tableswitch
for the second switch.
Upvotes: 3