Reputation: 1371

Memory Optimization of Java Collectors.toMap

I have a function that converts a list to a map. The map's size won't change after this function is called. I'm trying to decide between the following two implementations:

Map<Long, Object> listToMap(List<Object> objs) {
        /* Implementation One: */

        Map<Long, Object> map = new HashMap<>(objs.size(), 1);
        for (Object obj : objs) {
            map.put(obj.getKey(), obj);
        }
        return map;

        /* Implementation Two: */

        return objs.stream().collect(Collectors.toMap(Object::getKey, obj -> obj));

    }

In the first implementation, I've allocated just enough memory for all the elements by using a load factor of 1 and the list's size. This ensures that a resize operation won't be performed. Then, I iterate through the list and add elements one-by-one.

In the second implementation, I use Java 8 streams for better readability.

My question is: will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

Upvotes: 12

Answers (3)

Tomasz Linkowski

Reputation: 4496

Summing up what everyone else said and adding a little, here's a way to do it using a custom Collector. However, you should keep in mind two things:

Continuing the thought of Stephen C from his answer, you should't really be worried about optimizing such cases until you found out it really is a performance bottleneck in your application. As Donald Knuth said, "premature optimization is the root of all evil".
As shmosel pointed out in a comment, a Collector that allocates a HashMap with a predefined size will over-allocate if said Collector is used in a parallel mode. Therefore, the Collector I propose doesn't support parallel collection.

Having said that, you can write the following generic Collectors:

public class ExtraCollectors {

    public static <T, K, V> Collector<T, ?, HashMap<K, V>> toSizedMap(
            Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends V> valueMapper, int size) {
        return toSequentialMap(
                () -> com.google.common.collect.Maps.newHashMapWithExpectedSize(size),
                keyMapper, valueMapper, Collector.Characteristics.UNORDERED
        );
    }

    public static <T, K, V, M extends Map<K, V>> Collector<T, ?, M> toSequentialMap(
            Supplier<M> mapSupplier, Function<? super T, ? extends K> keyMapper,
            Function<? super T, ? extends V> valueMapper, Collector.Characteristics... characteristics) {
        return Collector.of(
                mapSupplier,
                (map, element) -> map.merge(
                        keyMapper.apply(element), valueMapper.apply(element), ExtraCollectors::mergeUnsupported
                ),
                ExtraCollectors::combineUnsupported,
                characteristics
        );
    }

    private static <T> T mergeUnsupported(T valueA, T valueB) {
        throw new UnsupportedOperationException("This Collector does not support merging.");
    }

    private static <A> A combineUnsupported(A accumulatorA, A accumulatorB) {
        throw new UnsupportedOperationException("This Collector does not support parallel streams.");
    }
}

Note that I used Guava's Maps.newHashMapWithExpectedSize so that you get a HashMap with exactly the size you need (it does roughly what Andreas explained in his comment to your question). If you don't have a dependency on Guava (and don't want to have), you can just copy the Maps.capacity method to your codebase.

Using the ExtraCollectors.toSizedMap() method defined above, your conversion method would look as follows:

Map<Long, KeyedObject> listToMap(List<? extends KeyedObject> objs) {
    return objs.stream().collect(ExtraCollectors.toSizedMap(KeyedObject::getKey, obj -> obj, objs.size()));

}

Still, if you really want maximum performance (at the price of reusability), you might skip the Stream API altogether, and apply your solution 1 but with Maps.newHashMapWithExpectedSize to get the HashMap's size right.

Upvotes: 2

Stephen C

Reputation: 719456

Will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

In your code, the former. See https://stackoverflow.com/a/51333961/139985

It is worth noting that for your current implementation:

Much of the extra memory consumed by resizing will be reclaimed on the next GC run.
After the collect finishes, you could still end up with a main hash array that is up to 2 times too big. The memory "wasted" could be up to 8 bytes per entry in the table, but on average it will be 4 bytes per entry.
Even so, the hash entry nodes will be the largest consumer of memory in a HashMap. Each entry consumes roughly 32 bytes ... in addition to the space used to represent the key and value.

^{(The above numbers assume 64 bit references.)}

As an alternative, if you use the 4 argument overload of toMap() you can provide a Supplier to create the Map to be populated. That allows you to do the following:

Allocate a HashMap with an initial capacity large enough to avoid resizing, but not too large.
Use a (hypothetical) alternative implementation of Map which uses less memory per entry than HashMap.
Create a wrapper to populate a map-like object that doesn't implement Map<K,V> ... for your K and V types. (For example, you could potentially use TLongObjectHashMap from the GNU Trove library.)

(In the latter two cases, the goal is to find a Map or "map-like" class that uses less memory (for your K and V types) but still has appropriate performance for lookups.)

Upvotes: 7

that other guy

Reputation: 123660

The second implementation will involve multiple resizes of the HashMap.

I determined this by simply running it in a debugger and breaking every time the hash map gets resized. First I tweaked the code you posted to make it compile on my system:

import java.util.*;
import java.util.stream.*;

class Test {
  public static void main(String[] args) {
    List<Object> list = new ArrayList<Object>();
    for(int i=0; i<100000; i++) {
      list.add(new Integer(i));
    }
    new Test().listToMap(list);
  }

    Map<Integer, Object> listToMap(List<Object> objs) {
        return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
    }
}

Then I compiled it and ran it in the debugger until it hit listToMap:

$ javac Test.java && jdb Test
Initializing jdb ...
> stop in Test.listToMap
Deferring breakpoint Test.listToMap.
It will be set after the class is loaded.
> run
run Test
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Test.listToMap

Breakpoint hit: "thread=main", Test.listToMap(), line=14 bci=0
14            return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));

main[1]

Then I set a breakpoint in java.util.HashMap.resize and continued:

main[1] stop in java.util.HashMap.resize
Set breakpoint java.util.HashMap.resize
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1]

and continued some more until I got bored:

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), 
line=678 bci=0

main[1] print size
 size = 3073
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 6145
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 12289

So yes: it most definitely keeps resizing over and over.

Upvotes: 10

Memory Optimization of Java Collectors.toMap

Answers (3)

Related Questions