Reputation: 37655
Suppose you have a method like this that computes the maximum of a Collection
for some ToIntFunction
:
static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty())
throw new NoSuchElementException();
int max = Integer.MIN_VALUE;
T maxT = null;
for (T t : collection) {
int result = function.applyAsInt(t);
if (result >= max) {
max = result;
maxT = t;
}
}
// do something with maxT
}
With Java 8, this could be translated into
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = collection.stream()
.max(Comparator.comparingInt(function))
.get();
// do something with maxT
}
A disadvantage with the new version is that function.applyAsInt
is invoked repeatedly for the same value of T
. (Specifically if the collection has size n
, foo1
invokes applyAsInt
n
times whereas foo2
invokes it 2n - 2
times).
Disadvantages of the first approach are that the code is less clear and you can't modify it to use parallelism.
Suppose you wanted to do this using parallel streams and only invoke applyAsInt
once per element. Can this be written in a simple way?
Upvotes: 16
Views: 912
Reputation: 34460
Another way would be to use a memoized version of function
:
static <T> void foo2(Collection<? extends T> collection,
ToIntFunction<? super T> function, T defaultValue) {
T maxT = collection.parallelStream()
.max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
.orElse(defaultValue);
// do something with maxT
}
Where ToIntMemoizer.memoize(function)
code would be as follows:
public class ToIntMemoizer<T> {
private final Map<T, Integer> cache = new ConcurrentHashMap<>();
private ToIntMemoizer() {
}
private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
return input -> cache.computeIfAbsent(input, function::apply);
}
public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
return new ToIntMemoizer<T>().doMemoize(function);
}
}
This uses a ConcurrentHashMap
to cache already computed results. If you don't need to support parallelism, you can perfectly use a HashMap
.
One disadvantage is that the result of the function needs to be boxed/unboxed. On the other hand, as the function is memoized, a result will be computed only once for each repeated element of the collection. Then, if the function is invoked with a repeated input value, the result will be returned from the cache.
Upvotes: 2
Reputation: 100259
If you don't mind using third-party library, my StreamEx optimizes all these cases in special methods like maxByInt
and so on. So you can simply use:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = StreamEx.of(collection).parallel()
.maxByInt(function)
.get();
// do something with maxT
}
The implementation uses reduce
with mutable container. This probably abuses API a little, but works fine for sequential and parallel streams and unlike collect
solution defers the container allocation to the first accumulated element (thus no container is allocated if parallel subtask covers no elements which occurs quite often if you have the filtering operation upstream).
Upvotes: 1
Reputation: 137164
You can use a custom collector that keeps running pair of the maximum value and the maximum element:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
class Pair {
int max = Integer.MIN_VALUE;
T maxT = null;
}
T maxT = collection.stream().collect(Collector.of(
Pair::new,
(p, t) -> {
int result = function.applyAsInt(t);
if (result >= p.max) {
p.max = result;
p.maxT = t;
}
},
(p1, p2) -> p2.max > p1.max ? p2 : p1,
p -> p.maxT
));
// do something with maxT
}
One advantage is that this creates a single Pair
intermediate object that is used through-out the collecting process. Each time an element is accepted, this holder is updated with the new maximum. The finisher operation just returns the maximum element and disgards the maximum value.
Upvotes: 10
Reputation: 11740
As I stated in the comments I would suggest introducing an intermediate datastructure like:
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty()) {
throw new IllegalArgumentException();
}
class Pair {
final T value;
final int result;
public Pair(T value, int result) {
this.value = value;
this.result = result;
}
public T getValue() {
return value;
}
public int getResult() {
return result;
}
}
T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
.max(Comparator.comparingInt(Pair::getResult)).get().getValue();
// do something with maxT
}
Upvotes: 3