Reputation: 2050
I'm stuck on an edge case in java stream manipulations...
I want to code the following behavior: "From an arbitrary basket of fruits, collect the 20 smallest, except the smallest pear, because we don't want that."
Added bonus: the baskets to come might not have any pear at all.
Examples :
So far, I'm at this step:
output = basket.stream()
.sorted(Comparator.comparing(Fruit::getSize))
//.filter(???)
.limit(20)
.collect(fruitCollector);
This seems like a case of stateful lambda filter, and I don't know how to do that.
I can't use a local firstPear
boolean and set it to true
after filtering the first pear, since all local variables in a lambda must be final.
Worst case scenario I can split the basket in two, pears and non-pears, sort the pears, and sublist them appropriately if there is any. This seems very inefficient and ugly. Is there a better way?
There was much variety in the answers posted here, and most of them are valid. In order to give back to the community, I put together a small testing harness to compare the performance of these algorithms.
This comparison was not as extensive as I wanted - It's been 3 weeks already. It only covers usage for sequential processing of simple items. Feel free to give the testing harness a go, and add more tests, more benchmarks, or your own implementation.
My analysis:
Algorithm | Author | Perf | Comments -------------------------------------------------------------------------------- Indexed removal | Holger | Best | Best overall, somewhat obscure Stateful predicate | pedromss | Best | Do not use for parallel processing Straightforward approach | Misha | Best | Better when few elements match Custom collector | Eugene | Good | Better when all or no element match Comaprator hack w/ dummy | yegodm | Good | - Comparator hack | xenteros | * | Perf sensitive to output size, fails on edge cases.
I acecpted pedromss' answer, as it's the one we implemented in the project, due to both its good performance, and "black-box" capabilities (the state-managing code is in an external class, and contributors can focus on the business logic).
Note that the accepted answer might not be the best for you: review the others, or check my testing project to see for yourself.
Upvotes: 18
Views: 4950
Reputation: 1
I have the same problem but solved by myself using a Map and ignore list. Here's the sample for your information. Hope can help.
@Test
public void testGetStckTraceElements() {
StackTraceElement[] stElements = Thread.currentThread().getStackTrace();
// define a list for filter out
List<String> ignoreClasses = Arrays.asList(
Thread.class.getName(),
this.getClass().getName()
);
// Map is using for check found before or not
Map<String,Boolean> findFrist = new HashMap<String,Boolean>();
Arrays.asList(stElements).stream()
.filter(s -> {
Platform.print("check: {}", s.getClassName());
if (Optional.ofNullable(findFrist.get(s.getClassName())).orElse(false)) {
return true;
}
findFrist.put(s.getClassName(), true);
for (String className:ignoreClasses) {
if (s.getClassName().equals(className)) return false;
}
return true;
})
.forEach(s->{
Platform.print("Result: {} {} {} {}", s.getClassName(), s.getMethodName(), s.getFileName(), s.getLineNumber());
});
}
Upvotes: 0
Reputation: 298539
Don’t try to filter upfront. Consider
List<Fruit> output = basket.stream()
.sorted(Comparator.comparing(Fruit::getSize))
.limit(21)
.collect(Collectors.toCollection(ArrayList::new));
int index = IntStream.range(0, output.size())
.filter(ix -> output.get(ix).isPear())
.findFirst().orElse(20);
if(index < output.size()) output.remove(index);
Just limit to 21
elements instead of 20
to be able to remove one. By using Collectors.toCollection(ArrayList::new)
you ensure to receive a mutable collection.
Then, there are three scenarios
The list contains a Pear
. Since the list is sorted by the fruit sizes, the first Pear
will also be the smallest Pear
, which is the one that has to be removed. The subsequent … .findFirst()
will evaluate to the index of the element.
The list does not contain a Pear
but has a size of 21
. In this case, we have to remove the last element, i.e. at index 20
, to get the desired result size. This is provided by .orElse(20)
, which will map an empty OptionalInt
to 20
.
The list may not contain any Pear
and be smaller than 21
, because the source list was already smaller. In this case, we don’t remove any elements, checked by prepending the remove
operation with if(index < output.size())
.
This entire post-processing can be considered irrelevant to the performance, as we already know beforehand, that it will be applied to a very small list, having at most 21
elements in this example. This is independent to the size of the source basket
list.
Upvotes: 2
Reputation: 30696
I think the Predicate
is the atomic operator of your operation. So the simplest way is write your own Predicate
to wrap the original Predicate
. let's say the wrap named as once
then your code can be simplify down to as below:
output = basket.stream().sorted(comparing(Fruit::getSize))
.filter(once(Fruit::isPear))
.limit(20).collect(fruitCollector);
static <T> Predicate<T> once(Predicate<T> predicate){
boolean[] seen = {true};
return it -> !seen[0] || (seen[0]=predicate.test(it));
}
If you want to support concurrent you can use an AtomicInteger
instead, for example:
static <T> Predicate<T> once(Predicate<T> predicate){
AtomicInteger seen = new AtomicInteger(0);
return it -> {
//if seen==0 then test predicate, otherwise increment only
IntBinaryOperator accumulator = (x,y)-> x==0 && predicate.test(it) ? x : x+y;
return seen.accumulateAndGet(1, accumulator) != 1;
};
}
Upvotes: 0
Reputation: 637
[Update], after reading the updated OP, I have a better understanding about the requirements: Here is the updated code by StreamEx:
Optional<Integer> smallestPear = StreamEx.of(basket).filter(Fruit::isPear)
.mapToInt(Fruit::getSize).min();
StreamEx.of(basket)
.chain(s -> smallestPear.map(v -> s.remove(f -> f.isPear() && f.getSize() == v).orElse(s))
.sortedBy(Fruit::getSize).limit(20).toList();
[update again] The above solution is pretty similar with the solution provided by Misha. if you don't want to go through the stream twice, Here is another solution by limited Predicate if the pair of (fruit type, size) in basket are unique:
// Save this method in your toolkit.
public class Fn {
public static <T> Predicate<T> limited(final Predicate<T> predicate, final int limit) {
Objects.requireNonNull(predicate);
return new Predicate<T>() {
private final AtomicInteger counter = new AtomicInteger(limit);
@Override
public boolean test(T t) {
return predicate.test(t) && counter.decrementAndGet() >= 0;
}
};
}
}
StreamEx.of(basket).sortedBy(Fruit::getSize)
.remove(f -> Fn.limited(Fruit::isPear, 1))
.limit(20).toList();
Upvotes: 0
Reputation: 28173
Have you considered a straightforward approach? Find the smallest pear, filter it out (if it exists) and collect 20 smallest:
Optional<Fruit> smallestPear = basket.stream()
.filter(Fruit::isPear) // or whatever it takes to test if it's a pear
.min(Fruit::getSize);
Stream<Fruit> withoutSmallestPear = smallestPear
.map(p -> basket.stream().filter(f -> f != p))
.orElseGet(basket::stream);
List<Fruit> result = withoutSmallestPear
.sorted(comparing(Fruit::getSize))
.limit(20)
.collect(toList());
Upvotes: 8
Reputation: 121028
As far as I can tell this has custom written all over it, so I did try a custom collector here:
private static <T> Collector<T, ?, List<T>> exceptCollector(Predicate<T> predicate, int size, Comparator<T> comparator) {
class Acc {
private TreeSet<T> matches = new TreeSet<>(comparator);
private TreeSet<T> doesNot = new TreeSet<>(comparator);
void accumulate(T t) {
if (predicate.test(t)) {
matches.add(t);
} else {
doesNot.add(t);
}
}
Acc combine(Acc other) {
matches.addAll(other.matches);
doesNot.addAll(other.doesNot);
return this;
}
List<T> finisher() {
T smallest = matches.first();
if (smallest != null) {
matches.remove(smallest);
}
matches.addAll(doesNot);
return matches.stream().limit(size).collect(Collectors.toList());
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::finisher);
}
And usage would be:
List<Fruit> fruits = basket.getFruits()
.stream()
.collect(exceptCollector(Fruit::isPear, 20, Comparator.comparing(Fruit::getSize)));
Upvotes: 7
Reputation: 1044
The key action is to sort by type and size in such a way that the smallest pear goes first. Something like that:
// create a dummy pear; size value does not matter as comparing by ref
final Pear dummy = new Pear(-1);
basket
// mix basket with the dummy pear
.concat(basket, Stream.of(dummy))
// sort by type so pears go first, then by size
.sorted(Comparator
.<Fruit>comparingInt(
// arrange the dummy to always be the last
// among other pears but before other types
f -> (f == dummy ?
0 :
(Pear.class.equals(f.getClass()) ? -1 : 1))
)
.thenComparing(f -> f.size)
)
// skip the smallest pear
.skip(1)
// filter out the dummy
.filter(f -> f != dummy)
// sort again the rest by size
.sorted(Comparator.comparingInt(f -> f.size))
// take 20 at max
.limit(20);
Upvotes: 3
Reputation: 15852
For easier implementation, I attach an example for:
class Fruit {
String name;
Long size;
}
The following will work:
Comparator<Fruit> fruitComparator = (o1, o2) -> {
if (o1.getName().equals("Peach") && o2.getName().equals("Peach")) {
return o2.getSize().compareTo(o1.getSize()); //reverse order of Peaches
}
if (o1.getName().equals("Peach")) {
return 1;
}
if (o2.getName().equals("Peach")) {
return -1;
}
return o1.getSize().compareTo(o2.getSize());
};
And:
output = basket.stream()
.sorted(Comparator.comparing(Fruit::getSize))
.limit(21)
.sorted(fruitComparator)
.limit(20)
.sorted(Comparator.comparing(Fruit::getSize))
.collect(fruitCollector);
My comparator will put the smallest Peach to the 21st position, will keep the order of other Fruit
s natural, so in case there is not Peach
, it will return 21st largest element. Then I sort the rest in the normal order.
This will work. It's a hack and at some circumstances might be a bad choice. I'd like to point out, that sorting 20 elements shouldn't be an issue.
Upvotes: 5
Reputation: 4506
Something like this might work (however groups into 2 baskets as you mentioned)
Function<Fruit, Boolean> isPear = f -> f.getType().equals("Pear");
Comparator<Fruit> fruitSize = Comparator.comparing(Fruit::getSize);
Map<Boolean, List<Fruit>> pearsAndOthers = basket.sorted(fruitSize).limit(21).collect(Collectors.groupingBy(isPear));
List<Fruit> pears = pearsAndOthers.get(true);
List<Fruit> others = pearsAndOthers.get(false);
Stream<Fruit> result;
if (pears.size() == 0) {
result = others.stream().limit(20);
} else if (pears.size() == 1) {
result = others.stream();
} else {
// You can probably merge in a nicer fashion since they should be sorted
result = Stream.concat(pears.stream().skip(1), others.stream()).sorted(fruitSize);
}
Upvotes: -1
Reputation: 2453
You can use a stateful predicate:
class StatefulPredicate<T> implements Predicate<T> {
private boolean alreadyFiltered;
private Predicate<T> pred;
public StatefulPredicate(Predicate<T> pred) {
this.pred = pred;
this.alreadyFiltered = false;
}
@Override
public boolean test(T t) {
if(alreadyFiltered) {
return true;
}
boolean result = pred.test(t);
alreadyFiltered = !result;
return result;
}
}
Stream.of(1, -1, 3, -4, -5, 6)
.filter(new StatefulPredicate<>(i -> i > 0))
.forEach(System.out::println);
Prints: 1, 3, -4, -5, 6
If concurrency is an issue you can use an atomic boolean.
If you wish to skip more than 1 element, add that parameter to your constructor and build your logic inside StatefulPredicate
This predicate filters the first negative element and then lets every other element pass, regardless. In your case you should test for instanceof Pear
Since people showed concerns about filter being stateless, from the documentation:
Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.
That predicate doesn't retain information about previously seen elements. It retains information about previous results.
Also it can be made thread safe to avoid concurrency issues.
Upvotes: 3