Reputation: 1371
I have a function that converts a list to a map. The map's size won't change after this function is called. I'm trying to decide between the following two implementations:
Map<Long, Object> listToMap(List<Object> objs) {
/* Implementation One: */
Map<Long, Object> map = new HashMap<>(objs.size(), 1);
for (Object obj : objs) {
map.put(obj.getKey(), obj);
}
return map;
/* Implementation Two: */
return objs.stream().collect(Collectors.toMap(Object::getKey, obj -> obj));
}
In the first implementation, I've allocated just enough memory for all the elements by using a load factor of 1 and the list's size. This ensures that a resize operation won't be performed. Then, I iterate through the list and add elements one-by-one.
In the second implementation, I use Java 8 streams for better readability.
My question is: will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?
Upvotes: 12
Views: 2283
Reputation: 4496
Summing up what everyone else said and adding a little, here's a way to do it using a custom Collector
. However, you should keep in mind two things:
Continuing the thought of Stephen C from his answer, you should't really be worried about optimizing such cases until you found out it really is a performance bottleneck in your application. As Donald Knuth said, "premature optimization is the root of all evil".
As shmosel pointed out in a comment, a Collector
that allocates a HashMap
with a predefined size will over-allocate if said Collector
is used in a parallel mode. Therefore, the Collector
I propose doesn't support parallel collection.
Having said that, you can write the following generic Collector
s:
public class ExtraCollectors {
public static <T, K, V> Collector<T, ?, HashMap<K, V>> toSizedMap(
Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends V> valueMapper, int size) {
return toSequentialMap(
() -> com.google.common.collect.Maps.newHashMapWithExpectedSize(size),
keyMapper, valueMapper, Collector.Characteristics.UNORDERED
);
}
public static <T, K, V, M extends Map<K, V>> Collector<T, ?, M> toSequentialMap(
Supplier<M> mapSupplier, Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends V> valueMapper, Collector.Characteristics... characteristics) {
return Collector.of(
mapSupplier,
(map, element) -> map.merge(
keyMapper.apply(element), valueMapper.apply(element), ExtraCollectors::mergeUnsupported
),
ExtraCollectors::combineUnsupported,
characteristics
);
}
private static <T> T mergeUnsupported(T valueA, T valueB) {
throw new UnsupportedOperationException("This Collector does not support merging.");
}
private static <A> A combineUnsupported(A accumulatorA, A accumulatorB) {
throw new UnsupportedOperationException("This Collector does not support parallel streams.");
}
}
Note that I used Guava's Maps.newHashMapWithExpectedSize so that you get a HashMap
with exactly the size you need (it does roughly what Andreas explained in his comment to your question). If you don't have a dependency on Guava (and don't want to have), you can just copy the Maps.capacity method to your codebase.
Using the ExtraCollectors.toSizedMap()
method defined above, your conversion method would look as follows:
Map<Long, KeyedObject> listToMap(List<? extends KeyedObject> objs) {
return objs.stream().collect(ExtraCollectors.toSizedMap(KeyedObject::getKey, obj -> obj, objs.size()));
}
Still, if you really want maximum performance (at the price of reusability), you might skip the Stream
API altogether, and apply your solution 1 but with Maps.newHashMapWithExpectedSize
to get the HashMap
's size right.
Upvotes: 2
Reputation: 719456
Will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?
In your code, the former. See https://stackoverflow.com/a/51333961/139985
It is worth noting that for your current implementation:
collect
finishes, you could still end up with a main hash array that is up to 2 times too big. The memory "wasted" could be up to 8 bytes per entry in the table, but on average it will be 4 bytes per entry.HashMap
. Each entry consumes roughly 32 bytes ... in addition to the space used to represent the key and value.(The above numbers assume 64 bit references.)
As an alternative, if you use the 4 argument overload of toMap()
you can provide a Supplier
to create the Map
to be populated. That allows you to do the following:
HashMap
with an initial capacity large enough to avoid resizing, but not too large.Map
which uses less memory per entry than HashMap
.Map<K,V>
... for your K
and V
types. (For example, you could potentially use TLongObjectHashMap
from the GNU Trove library.)(In the latter two cases, the goal is to find a Map
or "map-like" class that uses less memory (for your K
and V
types) but still has appropriate performance for lookups.)
Upvotes: 7
Reputation: 123660
The second implementation will involve multiple resizes of the HashMap.
I determined this by simply running it in a debugger and breaking every time the hash map gets resized. First I tweaked the code you posted to make it compile on my system:
import java.util.*;
import java.util.stream.*;
class Test {
public static void main(String[] args) {
List<Object> list = new ArrayList<Object>();
for(int i=0; i<100000; i++) {
list.add(new Integer(i));
}
new Test().listToMap(list);
}
Map<Integer, Object> listToMap(List<Object> objs) {
return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
}
}
Then I compiled it and ran it in the debugger until it hit listToMap
:
$ javac Test.java && jdb Test
Initializing jdb ...
> stop in Test.listToMap
Deferring breakpoint Test.listToMap.
It will be set after the class is loaded.
> run
run Test
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Test.listToMap
Breakpoint hit: "thread=main", Test.listToMap(), line=14 bci=0
14 return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
main[1]
Then I set a breakpoint in java.util.HashMap.resize
and continued:
main[1] stop in java.util.HashMap.resize
Set breakpoint java.util.HashMap.resize
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1]
and cont
inued some more until I got bored:
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(),
line=678 bci=0
main[1] print size
size = 3073
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] print size
size = 6145
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] print size
size = 12289
So yes: it most definitely keeps resizing over and over.
Upvotes: 10