Korn Pisey
Korn Pisey

Reputation: 135

Why writing map entries to a HashSet is slower than to a CopyOnWriteArraySet in Java

I think writing to a HashSet will be faster than to a CopyOnWriteArraySet; I'm not doing multi threading here. However I surpisingly got benchmark results indicate writing map entries to a CopyOnWriteArraySet is faster.

I did benchmarking on writing 1000 of Map.Entry<Integer, Integer> into a HashSet vs CopyOnWriteArraySet.

Benchmark          (n)   Mode  Cnt     Score    Error  Units
A.writeToCOWAS    1000  thrpt    4  1269.214 ± 40.635  ops/s
A.writeToHashSet  1000  thrpt    4   223.118 ± 34.955  ops/s

In addition to that, I got benchmark results of equals() and hashCode() of Map.Entry reveal that the former is more expensive.

Benchmark           Mode  Cnt          Score          Error  Units
MapEntry.equals    thrpt    4  177773394.054 ± 75436098.388  ops/s
MapEntry.hashCode  thrpt    4  272240403.923 ± 38794396.671  ops/s

I believe writing to a HashSet calls to hashCode() while CopyOnWriteArraySet calls to equals().

In the case of writing Integer or String,HashSet is way faster. Then I'm wondering what happens with Map.Entry type and why CopyOnWriteArraySet is faster according to my analysis?

My perf test:

@State(Scope.Benchmark)
@Fork(value = 2)
@Warmup(iterations = 2, time = 3)
@Measurement(iterations = 2, time = 3)
public class A {
    public Set<Map.Entry<Integer,Integer>> set;

    @Param({"1000"})
    public int n;

    @Setup
    public void setup() {
        set = new HashSet<>((int) (n / 0.75f + 1f), 0.75f);
        for (int i = 0; i < n; i++)
            set.add(Map.entry(i, i));
    }

    private void eliminateDeadCode(Set<Map.Entry<Integer,Integer>> out, Blackhole hole) {
        int hash = 0;
        for (Map.Entry<Integer,Integer> o : out)
            hash += o.hashCode();
        hole.consume(hash);
        if (out.size() != set.size())
            throw new RuntimeException(out.size() + " != " + set.size());
    }

    @Benchmark
    public void writeToCOWAS(Blackhole hole) {
        Set<Map.Entry<Integer,Integer>> out = new CopyOnWriteArraySet<>(set);
        eliminateDeadCode(out, hole);
    }

    @Benchmark
    public void writeToHashSet(Blackhole hole) {
        Set<Map.Entry<Integer,Integer>> out = new HashSet<>(set);
        eliminateDeadCode(out, hole);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(A.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }
}

Upvotes: 6

Views: 178

Answers (2)

Hulk
Hulk

Reputation: 6583

One essential aspect of HashMap/Set performance is the quality of the hashcode distribution, because bad distributions can lead to the map degrading to just a few large buckets, which leaves us with the bookkeeping overhead without getting any of the benefits.

It seems that the Map.Entry implementation returned by Map.entry() uses an implementation that always yields 0 when the hashcodes of key and value are equal, at least in the implementation I'm currently using (see Evgeniy Berezovsky's answer for details).

public static void main(String[] args) {
    int n = 1000;
    HashSet<Map.Entry<Integer, Integer>> set = new HashSet<>((int) (n / 0.75f + 1f), 0.75f);
    for (int i = 0; i < n; i++)
        set.add(Map.entry(i, i));
    
    set.forEach(e -> System.out.println(e.hashCode()));         
}

Output:

0
0
...

This yields worst-case performance for any hash-based collection.


This is probably less of a problem when using it for the purpose it was designed for, i.e. (from the javaDocs):

These entries are suitable for populating Map instances using the Map.ofEntries() method.

because the Nodes and Map itself only use the hashcode of the keys and values, not that of the entries, but it is a problem when using them as keys themselves.

Upvotes: 5

Evgeniy Berezovsky
Evgeniy Berezovsky

Reputation: 19248

Hulk's answer is very instructive. However the problem is not necessarily the Map.entry() hashCode implementation, which is this, at least in Java 11:

public int hashCode() {
    return key.hashCode() ^ value.hashCode();
}

The problem is that the hash codes of key and value are always the same, both in the OP's test, and in Hulk's test, hence the hash codes combined via XOR will always end up as 0. Change it so that key and value are different, and performance will change.

Upvotes: 5

Related Questions