Reputation: 687
I am research String.intern() and this method have a performance penalty. I've compared String.intern() with ConcurrentHashMap.putIfAbsent(s,s) with Microbenchmark. Used Java1.8.0_212, Ubuntu 18.04.2 LTS
@Param({"1", "100", "10000", "1000000"})
private int size;
private StringIntern stringIntern;
private ConcurrentHashMapIntern concurrentHashMapIntern;
@Setup
public void setup(){
stringIntern = new StringIntern();
concurrentHashMapIntern = new ConcurrentHashMapIntern();
}
public static class StringIntern{
public String intern(String s){
return s.intern();
}
}
public static class ConcurrentHashMapIntern{
private final Map<String, String> map;
public ConcurrentHashMapIntern(){
map= new ConcurrentHashMap<>();
}
public String intern(String s){
String existString = map.putIfAbsent(s, s);
return (existString == null) ? s : existString;
}
}
@Benchmark
public void intern(Blackhole blackhole){
for(int count =0; count<size; count ++){
blackhole.consume(stringIntern.intern("Example "+count));
}
}
@Benchmark
public void concurrentHashMapIntern(Blackhole blackhole){
for(int count =0; count<size; count++){
blackhole.consume(concurrentHashMapIntern.intern("Example " +count));
}
}
Result as expected. ConcurrentHashMap faster than String.intern() when search string.
Benchmark (size) Mode Cnt Score Error Units
MyBenchmark.concurrentHashMapIntern 1 avgt 5 0.056 ± 0.007 us/op
MyBenchmark.concurrentHashMapIntern 100 avgt 5 6.094 ± 2.359 us/op
MyBenchmark.concurrentHashMapIntern 10000 avgt 5 787.802 ± 264.179 us/op
MyBenchmark.concurrentHashMapIntern 1000000 avgt 5 136504.010 ± 17872.866 us/op
MyBenchmark.intern 1 avgt 5 0.129 ± 0.007 us/op
MyBenchmark.intern 100 avgt 5 13.700 ± 2.404 us/op
MyBenchmark.intern 10000 avgt 5 1618.514 ± 460.563 us/op
MyBenchmark.intern 1000000 avgt 5 1027915.854 ± 638910.023 us/op
String.intern() slower than ConcurrentHashMap because String.intern() is native HashTable implementation. And then, read javadoc about HashTable, this documantation says:
If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
This is very confusing situation. It recommend ConcurrentHashMap, but it using HashTable although performance penalty. Does anyone have any idea about why used native HashTable implemantation instance of ConcurrentHashMap ?
Upvotes: 1
Views: 633
Reputation: 719249
There are a number of things going on here:
Your benchmarks have very large error bars. The repeat counts are probably too small. This makes the results questionable.
It doesn't look like your benchmarks are resetting the "interned string" caches after each run1. So that means that the caches are growing, and each repetition will be starting with different conditions. This may explain the error bars ...
Your ConcurrentHashMap
is not functionally equivalent to String::intern
. The latter uses a native equivalent to Reference
objects to ensure that interned strings can be garbage collected. Your ConcurrentHashMap
implementation doesn't. Why does this matter?
ConcurrentHashMap
is a massive memory leak.String.intern() slower than ConcurrentHashMap because String.intern() is native HashTable implementation.
No. The real reason is that the native implementation is doing things differently:
intern
) string pool uses a custom hash table implemented in native code.Note that these things vary considerably across different Java versions.
This is very confusing situation. It recommend ConcurrentHashMap, but it using HashTable although performance penalty.
Now you are talking about a different scenario, that is not relevant to what you are doing.
Note that String::intern
doesn't use either HashTable
or HashMap
; see above.
The quote that you found is about how to get good concurrent performance from a hash table. Your benchmark is (AFAIK) single threaded. For a serial use-use case, HashMap
will give better performance than the others.
Does anyone have any idea about why used native
HashTable
implementation instance ofConcurrentHashMap
?
It doesn't use a hash table; see above. There are a number of reason that it doesn't HashTable
or HashMap
or ConcurrentHashMap
:
Reference
classes are significant.Finally, be carefully that you are not focusing on the wrong problem here. If you are trying to optimize interning because it is a bottleneck in your application, the other strategy is to not intern at all. In practice, it rarely saved memory (especially compared with G1GC's string de-duping) and rarely improves string handling performance.
In summary:
String::intern
is not optimized solely (even primarily) for speed.1 - And in the native intern
case, I don't think that is possible.
2 - A Java memory leak in the regular heap impacts on long-term GC performance because the retained objects need to be repeatedly marked and copied by the GC. There may be secondary effects too.
Upvotes: 6