Reputation: 135
I think writing to a HashSet
will be faster than to a CopyOnWriteArraySet
; I'm not doing multi threading here. However I surpisingly got benchmark results indicate writing map entries to a CopyOnWriteArraySet is faster.
I did benchmarking on writing 1000 of Map.Entry<Integer, Integer> into a HashSet vs CopyOnWriteArraySet.
Benchmark (n) Mode Cnt Score Error Units
A.writeToCOWAS 1000 thrpt 4 1269.214 ± 40.635 ops/s
A.writeToHashSet 1000 thrpt 4 223.118 ± 34.955 ops/s
In addition to that, I got benchmark results of equals() and hashCode() of Map.Entry
reveal that the former is more expensive.
Benchmark Mode Cnt Score Error Units
MapEntry.equals thrpt 4 177773394.054 ± 75436098.388 ops/s
MapEntry.hashCode thrpt 4 272240403.923 ± 38794396.671 ops/s
I believe writing to a HashSet calls to hashCode() while CopyOnWriteArraySet calls to equals().
In the case of writing Integer or String,HashSet is way faster. Then I'm wondering what happens with Map.Entry
type and why CopyOnWriteArraySet is faster according to my analysis?
My perf test:
@State(Scope.Benchmark)
@Fork(value = 2)
@Warmup(iterations = 2, time = 3)
@Measurement(iterations = 2, time = 3)
public class A {
public Set<Map.Entry<Integer,Integer>> set;
@Param({"1000"})
public int n;
@Setup
public void setup() {
set = new HashSet<>((int) (n / 0.75f + 1f), 0.75f);
for (int i = 0; i < n; i++)
set.add(Map.entry(i, i));
}
private void eliminateDeadCode(Set<Map.Entry<Integer,Integer>> out, Blackhole hole) {
int hash = 0;
for (Map.Entry<Integer,Integer> o : out)
hash += o.hashCode();
hole.consume(hash);
if (out.size() != set.size())
throw new RuntimeException(out.size() + " != " + set.size());
}
@Benchmark
public void writeToCOWAS(Blackhole hole) {
Set<Map.Entry<Integer,Integer>> out = new CopyOnWriteArraySet<>(set);
eliminateDeadCode(out, hole);
}
@Benchmark
public void writeToHashSet(Blackhole hole) {
Set<Map.Entry<Integer,Integer>> out = new HashSet<>(set);
eliminateDeadCode(out, hole);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(A.class.getSimpleName())
.build();
new Runner(opt).run();
}
}
Upvotes: 6
Views: 178
Reputation: 6583
One essential aspect of HashMap/Set performance is the quality of the hashcode distribution, because bad distributions can lead to the map degrading to just a few large buckets, which leaves us with the bookkeeping overhead without getting any of the benefits.
It seems that the Map.Entry
implementation returned by Map.entry() uses an implementation that always yields 0
when the hashcodes of key and value are equal, at least in the implementation I'm currently using (see Evgeniy Berezovsky's answer for details).
public static void main(String[] args) {
int n = 1000;
HashSet<Map.Entry<Integer, Integer>> set = new HashSet<>((int) (n / 0.75f + 1f), 0.75f);
for (int i = 0; i < n; i++)
set.add(Map.entry(i, i));
set.forEach(e -> System.out.println(e.hashCode()));
}
Output:
0
0
...
This yields worst-case performance for any hash-based collection.
This is probably less of a problem when using it for the purpose it was designed for, i.e. (from the javaDocs):
These entries are suitable for populating Map instances using the Map.ofEntries() method.
because the Nodes and Map itself only use the hashcode of the keys and values, not that of the entries, but it is a problem when using them as keys themselves.
Upvotes: 5
Reputation: 19248
Hulk's answer is very instructive. However the problem is not necessarily the Map.entry()
hashCode implementation, which is this, at least in Java 11:
public int hashCode() {
return key.hashCode() ^ value.hashCode();
}
The problem is that the hash codes of key and value are always the same, both in the OP's test, and in Hulk's test, hence the hash codes combined via XOR will always end up as 0. Change it so that key and value are different, and performance will change.
Upvotes: 5