Silver Quettier
Silver Quettier

Reputation: 2050

How to filter out only the first element not matching a predicate in a Java sequential Stream?

I'm stuck on an edge case in java stream manipulations...

I want to code the following behavior: "From an arbitrary basket of fruits, collect the 20 smallest, except the smallest pear, because we don't want that."

Added bonus: the baskets to come might not have any pear at all.

Examples :

So far, I'm at this step:

output = basket.stream()
    .sorted(Comparator.comparing(Fruit::getSize))
    //.filter(???)
    .limit(20)
    .collect(fruitCollector);

This seems like a case of stateful lambda filter, and I don't know how to do that.

I can't use a local firstPear boolean and set it to true after filtering the first pear, since all local variables in a lambda must be final.

Worst case scenario I can split the basket in two, pears and non-pears, sort the pears, and sublist them appropriately if there is any. This seems very inefficient and ugly. Is there a better way?


[Edit] Answer comparison

There was much variety in the answers posted here, and most of them are valid. In order to give back to the community, I put together a small testing harness to compare the performance of these algorithms.

This comparison was not as extensive as I wanted - It's been 3 weeks already. It only covers usage for sequential processing of simple items. Feel free to give the testing harness a go, and add more tests, more benchmarks, or your own implementation.

My analysis:

Algorithm                | Author   | Perf | Comments
--------------------------------------------------------------------------------
Indexed removal          | Holger   | Best | Best overall, somewhat obscure
Stateful predicate       | pedromss | Best | Do not use for parallel processing
Straightforward approach | Misha    | Best | Better when few elements match
Custom collector         | Eugene   | Good | Better when all or no element match
Comaprator hack w/ dummy | yegodm   | Good | -
Comparator hack          | xenteros | *    | Perf sensitive to output size, fails on edge cases.

I acecpted pedromss' answer, as it's the one we implemented in the project, due to both its good performance, and "black-box" capabilities (the state-managing code is in an external class, and contributors can focus on the business logic).

Note that the accepted answer might not be the best for you: review the others, or check my testing project to see for yourself.

Upvotes: 18

Views: 4950

Answers (10)

Charles
Charles

Reputation: 1

I have the same problem but solved by myself using a Map and ignore list. Here's the sample for your information. Hope can help.

@Test
public void testGetStckTraceElements() {
    StackTraceElement[] stElements = Thread.currentThread().getStackTrace();

    // define a list for filter out
    List<String> ignoreClasses = Arrays.asList(
            Thread.class.getName(),
            this.getClass().getName()
    );

    // Map is using for check found before or not
    Map<String,Boolean> findFrist = new HashMap<String,Boolean>();
    Arrays.asList(stElements).stream()
        .filter(s -> {
            Platform.print("check: {}", s.getClassName());
            if (Optional.ofNullable(findFrist.get(s.getClassName())).orElse(false)) {
                return true;
            }
            findFrist.put(s.getClassName(), true);
            for (String className:ignoreClasses) {
                if (s.getClassName().equals(className)) return false;
            }

            return true;

        })
        .forEach(s->{
            Platform.print("Result: {} {} {} {}", s.getClassName(), s.getMethodName(), s.getFileName(), s.getLineNumber());
    });

}

Upvotes: 0

Holger
Holger

Reputation: 298539

Don’t try to filter upfront. Consider

List<Fruit> output = basket.stream()
        .sorted(Comparator.comparing(Fruit::getSize))
        .limit(21)
        .collect(Collectors.toCollection(ArrayList::new));
int index = IntStream.range(0, output.size())
                     .filter(ix -> output.get(ix).isPear())
                     .findFirst().orElse(20);
if(index < output.size()) output.remove(index);

Just limit to 21 elements instead of 20 to be able to remove one. By using Collectors.toCollection(ArrayList::new) you ensure to receive a mutable collection.

Then, there are three scenarios

  1. The list contains a Pear. Since the list is sorted by the fruit sizes, the first Pear will also be the smallest Pear, which is the one that has to be removed. The subsequent … .findFirst() will evaluate to the index of the element.

  2. The list does not contain a Pear but has a size of 21. In this case, we have to remove the last element, i.e. at index 20, to get the desired result size. This is provided by .orElse(20), which will map an empty OptionalInt to 20.

  3. The list may not contain any Pear and be smaller than 21, because the source list was already smaller. In this case, we don’t remove any elements, checked by prepending the remove operation with if(index < output.size()).

This entire post-processing can be considered irrelevant to the performance, as we already know beforehand, that it will be applied to a very small list, having at most 21 elements in this example. This is independent to the size of the source basket list.

Upvotes: 2

holi-java
holi-java

Reputation: 30696

I think the Predicate is the atomic operator of your operation. So the simplest way is write your own Predicate to wrap the original Predicate. let's say the wrap named as once then your code can be simplify down to as below:

output = basket.stream().sorted(comparing(Fruit::getSize))
                        .filter(once(Fruit::isPear))
                        .limit(20).collect(fruitCollector);

static <T> Predicate<T> once(Predicate<T> predicate){
   boolean[] seen = {true};
   return it -> !seen[0] || (seen[0]=predicate.test(it));
}

If you want to support concurrent you can use an AtomicInteger instead, for example:

static <T> Predicate<T> once(Predicate<T> predicate){
   AtomicInteger seen = new AtomicInteger(0);

   return it -> {
     //if seen==0 then test predicate, otherwise increment only 
     IntBinaryOperator accumulator = (x,y)-> x==0 && predicate.test(it) ? x : x+y;
     return seen.accumulateAndGet(1, accumulator) != 1; 
   };
}

Upvotes: 0

123-xyz
123-xyz

Reputation: 637

[Update], after reading the updated OP, I have a better understanding about the requirements: Here is the updated code by StreamEx:

Optional<Integer> smallestPear = StreamEx.of(basket).filter(Fruit::isPear)
                                         .mapToInt(Fruit::getSize).min();

StreamEx.of(basket)
        .chain(s -> smallestPear.map(v -> s.remove(f -> f.isPear() && f.getSize() == v).orElse(s))
        .sortedBy(Fruit::getSize).limit(20).toList();

[update again] The above solution is pretty similar with the solution provided by Misha. if you don't want to go through the stream twice, Here is another solution by limited Predicate if the pair of (fruit type, size) in basket are unique:

// Save this method in your toolkit.
public class Fn {
    public static <T> Predicate<T> limited(final Predicate<T> predicate, final int limit) {
        Objects.requireNonNull(predicate);    
        return new Predicate<T>() {
            private final AtomicInteger counter = new AtomicInteger(limit);
            @Override
            public boolean test(T t) {
                return predicate.test(t) && counter.decrementAndGet() >= 0;
            }
        };
    }
}

StreamEx.of(basket).sortedBy(Fruit::getSize)
        .remove(f -> Fn.limited(Fruit::isPear, 1))
        .limit(20).toList();

Upvotes: 0

Misha
Misha

Reputation: 28173

Have you considered a straightforward approach? Find the smallest pear, filter it out (if it exists) and collect 20 smallest:

Optional<Fruit> smallestPear = basket.stream()
        .filter(Fruit::isPear)  // or whatever it takes to test if it's a pear
        .min(Fruit::getSize);

Stream<Fruit> withoutSmallestPear = smallestPear
        .map(p -> basket.stream().filter(f -> f != p))
        .orElseGet(basket::stream);

List<Fruit> result = withoutSmallestPear
        .sorted(comparing(Fruit::getSize))
        .limit(20)
        .collect(toList());

Upvotes: 8

Eugene
Eugene

Reputation: 121028

As far as I can tell this has custom written all over it, so I did try a custom collector here:

private static <T> Collector<T, ?, List<T>> exceptCollector(Predicate<T> predicate, int size, Comparator<T> comparator) {

    class Acc {

        private TreeSet<T> matches = new TreeSet<>(comparator);

        private TreeSet<T> doesNot = new TreeSet<>(comparator);

        void accumulate(T t) {
            if (predicate.test(t)) {
                matches.add(t);
            } else {
                doesNot.add(t);
            }
        }

        Acc combine(Acc other) {

            matches.addAll(other.matches);
            doesNot.addAll(other.doesNot);

            return this;
        }

        List<T> finisher() {
            T smallest = matches.first();
            if (smallest != null) {
                matches.remove(smallest);
            }

            matches.addAll(doesNot);
            return matches.stream().limit(size).collect(Collectors.toList());
        }

    }
    return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::finisher);
}

And usage would be:

List<Fruit> fruits = basket.getFruits()
            .stream()
            .collect(exceptCollector(Fruit::isPear, 20, Comparator.comparing(Fruit::getSize)));

Upvotes: 7

yegodm
yegodm

Reputation: 1044

The key action is to sort by type and size in such a way that the smallest pear goes first. Something like that:

// create a dummy pear; size value does not matter as comparing by ref
final Pear dummy = new Pear(-1);
basket
   // mix basket with the dummy pear
   .concat(basket, Stream.of(dummy))
      // sort by type so pears go first, then by size
      .sorted(Comparator
          .<Fruit>comparingInt(
              // arrange the dummy to always be the last 
              // among other pears but before other types 
              f -> (f == dummy ? 
                 0 : 
                 (Pear.class.equals(f.getClass()) ? -1 : 1))
          )
          .thenComparing(f -> f.size)
      )
      // skip the smallest pear
      .skip(1)
      // filter out the dummy
      .filter(f -> f != dummy)
      // sort again the rest by size
      .sorted(Comparator.comparingInt(f -> f.size))
      // take 20 at max
      .limit(20);

Upvotes: 3

xenteros
xenteros

Reputation: 15852

For easier implementation, I attach an example for:

class Fruit {
    String name;
    Long size;
}

The following will work:

Comparator<Fruit> fruitComparator = (o1, o2) -> {

    if (o1.getName().equals("Peach") && o2.getName().equals("Peach")) {
        return o2.getSize().compareTo(o1.getSize()); //reverse order of Peaches
    }

    if (o1.getName().equals("Peach")) {
        return 1;
    }
    if (o2.getName().equals("Peach")) {
        return -1;
    }
    return o1.getSize().compareTo(o2.getSize());
};

And:

output = basket.stream()
    .sorted(Comparator.comparing(Fruit::getSize))
    .limit(21)
    .sorted(fruitComparator)
    .limit(20)
    .sorted(Comparator.comparing(Fruit::getSize))
    .collect(fruitCollector);

My comparator will put the smallest Peach to the 21st position, will keep the order of other Fruits natural, so in case there is not Peach, it will return 21st largest element. Then I sort the rest in the normal order.

This will work. It's a hack and at some circumstances might be a bad choice. I'd like to point out, that sorting 20 elements shouldn't be an issue.

Upvotes: 5

Viktor Mellgren
Viktor Mellgren

Reputation: 4506

Something like this might work (however groups into 2 baskets as you mentioned)

    Function<Fruit, Boolean> isPear = f -> f.getType().equals("Pear");
    Comparator<Fruit> fruitSize = Comparator.comparing(Fruit::getSize);
    Map<Boolean, List<Fruit>> pearsAndOthers = basket.sorted(fruitSize).limit(21).collect(Collectors.groupingBy(isPear));

    List<Fruit> pears = pearsAndOthers.get(true);
    List<Fruit> others = pearsAndOthers.get(false);

    Stream<Fruit> result;
    if (pears.size() == 0) {
        result = others.stream().limit(20);
    } else if (pears.size() == 1) {
        result = others.stream();
    } else {
        // You can probably merge in a nicer fashion since they should be sorted
        result = Stream.concat(pears.stream().skip(1), others.stream()).sorted(fruitSize);
    }

Upvotes: -1

pedromss
pedromss

Reputation: 2453

You can use a stateful predicate:

class StatefulPredicate<T> implements Predicate<T> {

    private boolean alreadyFiltered;
    private Predicate<T> pred;

    public StatefulPredicate(Predicate<T> pred) {
        this.pred = pred;
        this.alreadyFiltered = false;
    }

    @Override
    public boolean test(T t) {
        if(alreadyFiltered) {
            return true;
        }

        boolean result = pred.test(t);
        alreadyFiltered = !result;
        return result;
    }
}

    Stream.of(1, -1, 3, -4, -5, 6)
        .filter(new StatefulPredicate<>(i -> i > 0))
        .forEach(System.out::println);

Prints: 1, 3, -4, -5, 6

If concurrency is an issue you can use an atomic boolean.

If you wish to skip more than 1 element, add that parameter to your constructor and build your logic inside StatefulPredicate

This predicate filters the first negative element and then lets every other element pass, regardless. In your case you should test for instanceof Pear

Edit

Since people showed concerns about filter being stateless, from the documentation:

Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.

That predicate doesn't retain information about previously seen elements. It retains information about previous results.

Also it can be made thread safe to avoid concurrency issues.

Upvotes: 3

Related Questions