Why is map.values().stream() much slower than Array.stream(array)

Question

Creating two constructs for University second semester computer science to count words in a text. One implementation uses an Array with Word-Objects which save the word as string and its frequency as int. The other uses as HashMap with the Word as key and the frequency as value. Now their is a function "totalWords" that should return the sum of all frequency.

In the HashMap variant:

return _map.values().stream().reduce(0, (a, b) -> a + b);

In the array variant:

return Arrays.stream(_words)
            .map((word) -> word != null ? word.count() : 0)
            .reduce(0, (a, b) -> a + b);

My problem is: in a JUnit test with a very short test text the Array variant does need approximately 0.001s and the map variant needs 0.040s and I do not understand why the map does need so much more time. Has anybody an explanation and maybe a better solution?

betteroutthanin · Accepted Answer

One of the reason is that iterating a HashMap can be much slower than an Array, the reason is locality. The computation bottleneck of modern processor is dominated by memory access, and that's why cache is used. Array stores data in a contiguous chunk of memory, which means when you swap that chunk of memory into cache, it is more likely that you are using everything in the cache, or you get cache hits, so cache likes contiguous memory of data. On the other hand, each element of HashMap is stored in different place in the memory, so when you traverse the HashMap, you get a lot of cache misses, you end up with swapping data in and out of cache all the time, which dramatically slows down you program.

Although the actual implementation of HashMap is in an optimized way such that the data in the memory is clustered together, but even in this case, (@Radiodef)since HashMap uses some sort of linked list, each element of HashMap contains extra pointers, so a HashMap consumes more memory than an Array, more memory means more cache misses and more page faults, so HashMap in general is slower than Array.

Why is map.values().stream() much slower than Array.stream(array)

Answers (2)

Related Questions