Paul Boddington
Paul Boddington

Reputation: 37655

How to efficiently compute the maximum value of a collection after applying some function

Suppose you have a method like this that computes the maximum of a Collection for some ToIntFunction:

static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    if (collection.isEmpty())
        throw new NoSuchElementException();
    int max = Integer.MIN_VALUE;
    T maxT = null;
    for (T t : collection) {
        int result = function.applyAsInt(t);
        if (result >= max) {
            max = result;
            maxT = t;
        }
    }
    // do something with maxT
}

With Java 8, this could be translated into

static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    T maxT = collection.stream()
                       .max(Comparator.comparingInt(function))
                       .get();
    // do something with maxT
}

A disadvantage with the new version is that function.applyAsInt is invoked repeatedly for the same value of T. (Specifically if the collection has size n, foo1 invokes applyAsInt n times whereas foo2 invokes it 2n - 2 times).

Disadvantages of the first approach are that the code is less clear and you can't modify it to use parallelism.

Suppose you wanted to do this using parallel streams and only invoke applyAsInt once per element. Can this be written in a simple way?

Upvotes: 16

Views: 912

Answers (4)

fps
fps

Reputation: 34460

Another way would be to use a memoized version of function:

static <T> void foo2(Collection<? extends T> collection, 
    ToIntFunction<? super T> function, T defaultValue) {

    T maxT = collection.parallelStream()
        .max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
        .orElse(defaultValue);

    // do something with maxT

}

Where ToIntMemoizer.memoize(function) code would be as follows:

public class ToIntMemoizer<T> {

    private final Map<T, Integer> cache = new ConcurrentHashMap<>();

    private ToIntMemoizer() {
    }

    private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
        return input -> cache.computeIfAbsent(input, function::apply);
    }

    public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
        return new ToIntMemoizer<T>().doMemoize(function);
    }
}

This uses a ConcurrentHashMap to cache already computed results. If you don't need to support parallelism, you can perfectly use a HashMap.

One disadvantage is that the result of the function needs to be boxed/unboxed. On the other hand, as the function is memoized, a result will be computed only once for each repeated element of the collection. Then, if the function is invoked with a repeated input value, the result will be returned from the cache.

Upvotes: 2

Tagir Valeev
Tagir Valeev

Reputation: 100259

If you don't mind using third-party library, my StreamEx optimizes all these cases in special methods like maxByInt and so on. So you can simply use:

static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    T maxT = StreamEx.of(collection).parallel()
                       .maxByInt(function)
                       .get();
    // do something with maxT
}

The implementation uses reduce with mutable container. This probably abuses API a little, but works fine for sequential and parallel streams and unlike collect solution defers the container allocation to the first accumulated element (thus no container is allocated if parallel subtask covers no elements which occurs quite often if you have the filtering operation upstream).

Upvotes: 1

Tunaki
Tunaki

Reputation: 137164

You can use a custom collector that keeps running pair of the maximum value and the maximum element:

static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    class Pair {
        int max = Integer.MIN_VALUE;
        T maxT = null;
    }
    T maxT = collection.stream().collect(Collector.of(
        Pair::new,
        (p, t) -> {
            int result = function.applyAsInt(t);
            if (result >= p.max) {
                p.max = result;
                p.maxT = t;
            }
        }, 
        (p1, p2) -> p2.max > p1.max ? p2 : p1,
        p -> p.maxT
    ));
    // do something with maxT
}

One advantage is that this creates a single Pair intermediate object that is used through-out the collecting process. Each time an element is accepted, this holder is updated with the new maximum. The finisher operation just returns the maximum element and disgards the maximum value.

Upvotes: 10

Flown
Flown

Reputation: 11740

As I stated in the comments I would suggest introducing an intermediate datastructure like:

static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
  if (collection.isEmpty()) {
    throw new IllegalArgumentException();
  }
  class Pair {
    final T value;
    final int result;

    public Pair(T value, int result) {
      this.value = value;
      this.result = result;
    }

    public T getValue() {
      return value;
    }

    public int getResult() {
      return result;
    }
  }
  T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
                     .max(Comparator.comparingInt(Pair::getResult)).get().getValue();
  // do something with maxT
}

Upvotes: 3

Related Questions