Learn Hadoop
Learn Hadoop

Reputation: 3050

Java Streams – How to group by value and find min and max value of each group?

For my example, having car object and found that min and max price value based on model (group by).

List<Car> carsDetails = UserDB.getCarsDetails();
Map<String, DoubleSummaryStatistics> collect4 = carsDetails.stream()
                .collect(Collectors.groupingBy(Car::getMake, Collectors.summarizingDouble(Car::getPrice)));
collect4.entrySet().forEach(e->System.out.println(e.getKey()+" "+e.getValue().getMax()+" "+e.getValue().getMin()));

output :
Lexus 94837.79 17569.59
Subaru 96583.25 8498.41
Chevrolet 99892.59 6861.85

But I couldn't find which car objects have max and min price. How can I do that?

Upvotes: 24

Views: 33305

Answers (4)

vaibhav verma
vaibhav verma

Reputation: 21

For Car and it's max price :

Map<String, Optional<Car>> groupByMaxPrice =
             carsDetails.stream().collect(
                     Collectors.groupingBy(Car::getMake, Collectors.maxBy(Comparator.comparing(Car::getPrice)));

For Car and it's min price :

Map<String, Optional<Car>> groupByMaxPrice =
             carsDetails.stream().collect(
                     Collectors.groupingBy(Car::getMake, Collectors.maxBy(Comparator.comparing(Car::getPrice)));

Upvotes: 2

rolve
rolve

Reputation: 10218

Here is a very concise solution. It collects all Cars into a SortedSet and thus works without any additional classes.

Map<String, SortedSet<Car>> grouped = carDetails.stream()
        .collect(groupingBy(Car::getMake, toCollection(
                () -> new TreeSet<>(comparingDouble(Car::getPrice)))));

grouped.forEach((make, cars) -> System.out.println(make
        + " cheapest: " + cars.first()
        + " most expensive: " + cars.last()));

A possible downside is performance, as all Cars are collected, not just the current min and max. But unless the data set is very large, I don't think it will be noticeable.

Upvotes: 8

Tomasz Linkowski
Tomasz Linkowski

Reputation: 4496

I would like to propose a solution that (in my opinion) strives for greatest readability (which reduces e.g. the maintenance burden of such code).

It's Collector-based so - as a bonus - it can be used with a parallel Stream. It assumes the objects are non-null.

final class MinMaxFinder<T> {

    private final Comparator<T> comparator;

    MinMaxFinder(Comparator<T> comparator) {
        this.comparator = comparator;
    }

    Collector<T, ?, MinMaxResult<T>> collector() {
        return Collector.of(
                MinMaxAccumulator::new,
                MinMaxAccumulator::add,
                MinMaxAccumulator::combine,
                MinMaxAccumulator::toResult
        );
    }

    private class MinMaxAccumulator {
        T min = null;
        T max = null;

        MinMaxAccumulator() {
        }

        private boolean isEmpty() {
            return min == null;
        }

        void add(T item) {
            if (isEmpty()) {
                min = max = item;
            } else {
                updateMin(item);
                updateMax(item);
            }
        }

        MinMaxAccumulator combine(MinMaxAccumulator otherAcc) {
            if (isEmpty()) {
                return otherAcc;
            }
            if (!otherAcc.isEmpty()) {
                updateMin(otherAcc.min);
                updateMax(otherAcc.max);
            }
            return this;
        }

        private void updateMin(T item) {
            min = BinaryOperator.minBy(comparator).apply(min, item);
        }

        private void updateMax(T item) {
            max = BinaryOperator.maxBy(comparator).apply(max, item);
        }

        MinMaxResult<T> toResult() {
            return new MinMaxResult<>(min, max);
        }
    }
}

The result-holder value-like class:

public class MinMaxResult<T> {
    private final T min;
    private final T max;

    public MinMaxResult(T min, T max) {
        this.min = min;
        this.max = max;
    }

    public T min() {
        return min;
    }

    public T max() {
        return max;
    }
}

Usage:

MinMaxFinder<Car> minMaxFinder = new MinMaxFinder<>(Comparator.comparing(Car::getPrice));
Map<String, MinMaxResult<Car>> minMaxResultMap = carsDetails.stream()
            .collect(Collectors.groupingBy(Car::getMake, minMaxFinder.collector()));

Upvotes: 1

Holger
Holger

Reputation: 298123

If you were interested in only one Car per group, you could use, e.g.

Map<String, Car> mostExpensives = carsDetails.stream()
    .collect(Collectors.toMap(Car::getMake, Function.identity(),
        BinaryOperator.maxBy(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,car) -> System.out.println(make+" "+car));

But since you want the most expensive and the cheapest, you need something like this:

Map<String, List<Car>> mostExpensivesAndCheapest = carsDetails.stream()
    .collect(Collectors.toMap(Car::getMake, car -> Arrays.asList(car, car),
        (l1,l2) -> Arrays.asList(
            (l1.get(0).getPrice()>l2.get(0).getPrice()? l2: l1).get(0),
            (l1.get(1).getPrice()<l2.get(1).getPrice()? l2: l1).get(1))));
mostExpensivesAndCheapest.forEach((make,cars) -> System.out.println(make
        +" cheapest: "+cars.get(0)+" most expensive: "+cars.get(1)));

This solution bears a bit of inconvenience due to the fact that there is no generic statistics object equivalent to DoubleSummaryStatistics. If this happens more than once, it’s worth filling the gap with a class like this:

/**
 * Like {@code DoubleSummaryStatistics}, {@code IntSummaryStatistics}, and
 * {@code LongSummaryStatistics}, but for an arbitrary type {@code T}.
 */
public class SummaryStatistics<T> implements Consumer<T> {
    /**
     * Collect to a {@code SummaryStatistics} for natural order.
     */
    public static <T extends Comparable<? super T>> Collector<T,?,SummaryStatistics<T>>
                  statistics() {
        return statistics(Comparator.<T>naturalOrder());
    }
    /**
     * Collect to a {@code SummaryStatistics} using the specified comparator.
     */
    public static <T> Collector<T,?,SummaryStatistics<T>>
                  statistics(Comparator<T> comparator) {
        Objects.requireNonNull(comparator);
        return Collector.of(() -> new SummaryStatistics<>(comparator),
            SummaryStatistics::accept, SummaryStatistics::merge);
    }
    private final Comparator<T> c;
    private T min, max;
    private long count;
    public SummaryStatistics(Comparator<T> comparator) {
        c = Objects.requireNonNull(comparator);
    }

    public void accept(T t) {
        if(count == 0) {
            count = 1;
            min = t;
            max = t;
        }
        else {
            if(c.compare(min, t) > 0) min = t;
            if(c.compare(max, t) < 0) max = t;
            count++;
        }
    }
    public SummaryStatistics<T> merge(SummaryStatistics<T> s) {
        if(s.count > 0) {
            if(count == 0) {
                count = s.count;
                min = s.min;
                max = s.max;
            }
            else {
                if(c.compare(min, s.min) > 0) min = s.min;
                if(c.compare(max, s.max) < 0) max = s.max;
                count += s.count;
            }
        }
        return this;
    }

    public long getCount() {
        return count;
    }

    public T getMin() {
        return min;
    }

    public T getMax() {
        return max;
    }

    @Override
    public String toString() {
        return count == 0? "empty": (count+" elements between "+min+" and "+max);
    }
}

After adding this to your code base, you may use it like

Map<String, SummaryStatistics<Car>> mostExpensives = carsDetails.stream()
    .collect(Collectors.groupingBy(Car::getMake,
        SummaryStatistics.statistics(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,cars) -> System.out.println(make+": "+cars));

If getPrice returns double, it may be more efficient to use Comparator.comparingDouble(Car::getPrice) instead of Comparator.comparing(Car::getPrice).

Upvotes: 39

Related Questions