Reputation: 8768

How to filter and map on a value in a Java 8 stream without redundant calculation?

I have superficial insight into Java 8 Collection streaming capabilities, so I'm not sure if the following is even possible: I'd like to filter a collection based on an integer comparison and reuse that value for the mapping.

Concretely I have a Collection<String> strings and would like to map each of its values to the Levenshtein distance to a fixed String x if the Levenshtein distance is less than a value levenshteinLimit.

String x = "some string";
Collection<String> strings = new LinkedList<>(Arrays.asList("not some string",
        "some other string"));
int levenshteinLimit = 10;
Map<Integer, String> stringsLevenshteinMap = strings.stream()
        .filter(string -> LevenshteinDistance.getDefaultInstance().apply(x, string) < levenshteinLimit)
        .collect(Collectors.toMap(string -> LevenshteinDistance.getDefaultInstance().apply(x, string), Function.identity()));
System.out.println(stringsLevenshteinMap);

works well and expresses what I'm looking for as result, but requires the redundant calculation of the distance. This is not a problem so far. A solution without streams is possbile as well. I'm trying to learn something new.

I assume that it's more efficient to filter first and then map because the number of objects is potentially smaller which means less work.

The code above uses Apache commons-text 1.1.

Upvotes: 2

Answers (4)

Holger

Reputation: 298399

If you want to avoid a temporary object holding the key and value, you need a custom collector, retracing what the builtin collector does, but incorporating the filtering directly. By the way, I don’t think that Collectors.toMap is appropriate here, as there is no guaranty that there will be only one string for each distance. Therefore, I used Collectors.groupingBy as a template instead:

public static <T> Collector<T,?,Map<Integer,List<T>>>
                  grouping(ToIntFunction<T> f, int limit) {
    return Collector.of(HashMap::new,
        (m,t) -> {
            int v = f.applyAsInt(t);
            if(v < limit) m.computeIfAbsent(v, x -> new ArrayList<>()).add(t);
        },
        (m1,m2) -> {
            m2.forEach((k,v) -> m1.merge(k, v, (l1,l2)->{ l1.addAll(l2); return l1; }));
            return m1;
        });
}

This basically does what Collectors.groupingBy does, but restricts its usage to a key function evaluating to an int number and only processing elements mapping to numbers below a specified limit. It would also be possible to generalize this to use a Function and a Predicate instead.

This can be used like

Map<Integer, List<String>> stringsLevenshteinMap
    = Stream.of("not some string", "some other string")
            .collect(grouping(
                string -> LevenshteinDistance.getDefaultInstance().apply(x, string),
                levenshteinLimit));

But it must be emphasized that there is no guaranty that this performs better than just creating a temporary object holding the two values; this depends on a lot of environmental factors. Creating temporary objects is not expensive in general. The approach using a holder object is much more flexible, most notably when you want to change the stream operation later-on.

Upvotes: 1

JB Nizet

Reputation: 691943

First, you could make the code faster by avoiding to create a useless LinkedList.

Now, regarding your question, if you want to keep using streams to do that, then the solution would be to map each of the strings to an object containing the string and its distance, then filter these objects, then collect to a map:

String x = "some string";
int levenshteinLimit = 10;

List<String> strings = Arrays.asList("not some string", "some other string"));
Map<Integer, String> stringsLevenshteinMap = 
    strings.stream()
           .map(string -> new StringWithDistance(string, LevenshteinDistance.getDefaultInstance().apply(x, string))
           .filter(o -> o.getDistance() < levenshteinLimit)
           .collect(Collectors.toMap(StringWithDistance::getDistance, StringWithDistance.getString));

System.out.println(stringsLevenshteinMap);

Upvotes: 0

daniu

Reputation: 15008

Something with a Tuple intermediate object should work:

Map<Integer, String> stringsLevenshteinMap = strings.stream()
    .map(s -> new Tuple<>(LD.getInstance().apply(x, s), s)
    .filter(t -> t.getFirst() < maxDistance)
    .collect(Collectors.toMap(Tuple::getFirst, Tuple::getSecond));

Upvotes: 5

Lothar

Reputation: 5459

Out of my head I think this should work:

Map<Integer, String> stringsLevenshteinMap = strings.stream()
    .map(string -> LevenshteinDistance.getDefaultInstance().apply(x, string))
    .filter(val -> val < levenshteinLimit)
    .collect(Collectors.toMap(val -> val, Function.identity()));

But as I said: This is out of my head, so I can't guarantee functionality or compilability. But the idea should be obvious. Instead of calculating things twice, we map to the calculated values and continue from there doing the filtering and collecting.

Upvotes: -1

How to filter and map on a value in a Java 8 stream without redundant calculation?

Answers (4)

Related Questions