Reputation: 8768
I have superficial insight into Java 8 Collection streaming capabilities, so I'm not sure if the following is even possible: I'd like to filter
a collection based on an integer comparison and reuse that value for the mapping.
Concretely I have a Collection<String> strings
and would like to map each of its values to the Levenshtein distance to a fixed String x
if the Levenshtein distance is less than a value levenshteinLimit
.
String x = "some string";
Collection<String> strings = new LinkedList<>(Arrays.asList("not some string",
"some other string"));
int levenshteinLimit = 10;
Map<Integer, String> stringsLevenshteinMap = strings.stream()
.filter(string -> LevenshteinDistance.getDefaultInstance().apply(x, string) < levenshteinLimit)
.collect(Collectors.toMap(string -> LevenshteinDistance.getDefaultInstance().apply(x, string), Function.identity()));
System.out.println(stringsLevenshteinMap);
works well and expresses what I'm looking for as result, but requires the redundant calculation of the distance. This is not a problem so far. A solution without streams is possbile as well. I'm trying to learn something new.
I assume that it's more efficient to filter first and then map because the number of objects is potentially smaller which means less work.
The code above uses Apache commons-text 1.1.
Upvotes: 2
Views: 3151
Reputation: 298399
If you want to avoid a temporary object holding the key and value, you need a custom collector, retracing what the builtin collector does, but incorporating the filtering directly. By the way, I don’t think that Collectors.toMap
is appropriate here, as there is no guaranty that there will be only one string for each distance. Therefore, I used Collectors.groupingBy
as a template instead:
public static <T> Collector<T,?,Map<Integer,List<T>>>
grouping(ToIntFunction<T> f, int limit) {
return Collector.of(HashMap::new,
(m,t) -> {
int v = f.applyAsInt(t);
if(v < limit) m.computeIfAbsent(v, x -> new ArrayList<>()).add(t);
},
(m1,m2) -> {
m2.forEach((k,v) -> m1.merge(k, v, (l1,l2)->{ l1.addAll(l2); return l1; }));
return m1;
});
}
This basically does what Collectors.groupingBy
does, but restricts its usage to a key function evaluating to an int
number and only processing elements mapping to numbers below a specified limit. It would also be possible to generalize this to use a Function
and a Predicate
instead.
This can be used like
Map<Integer, List<String>> stringsLevenshteinMap
= Stream.of("not some string", "some other string")
.collect(grouping(
string -> LevenshteinDistance.getDefaultInstance().apply(x, string),
levenshteinLimit));
But it must be emphasized that there is no guaranty that this performs better than just creating a temporary object holding the two values; this depends on a lot of environmental factors. Creating temporary objects is not expensive in general. The approach using a holder object is much more flexible, most notably when you want to change the stream operation later-on.
Upvotes: 1
Reputation: 691943
First, you could make the code faster by avoiding to create a useless LinkedList.
Now, regarding your question, if you want to keep using streams to do that, then the solution would be to map each of the strings to an object containing the string and its distance, then filter these objects, then collect to a map:
String x = "some string";
int levenshteinLimit = 10;
List<String> strings = Arrays.asList("not some string", "some other string"));
Map<Integer, String> stringsLevenshteinMap =
strings.stream()
.map(string -> new StringWithDistance(string, LevenshteinDistance.getDefaultInstance().apply(x, string))
.filter(o -> o.getDistance() < levenshteinLimit)
.collect(Collectors.toMap(StringWithDistance::getDistance, StringWithDistance.getString));
System.out.println(stringsLevenshteinMap);
Upvotes: 0
Reputation: 15008
Something with a Tuple intermediate object should work:
Map<Integer, String> stringsLevenshteinMap = strings.stream()
.map(s -> new Tuple<>(LD.getInstance().apply(x, s), s)
.filter(t -> t.getFirst() < maxDistance)
.collect(Collectors.toMap(Tuple::getFirst, Tuple::getSecond));
Upvotes: 5
Reputation: 5459
Out of my head I think this should work:
Map<Integer, String> stringsLevenshteinMap = strings.stream()
.map(string -> LevenshteinDistance.getDefaultInstance().apply(x, string))
.filter(val -> val < levenshteinLimit)
.collect(Collectors.toMap(val -> val, Function.identity()));
But as I said: This is out of my head, so I can't guarantee functionality or compilability. But the idea should be obvious. Instead of calculating things twice, we map to the calculated values and continue from there doing the filtering and collecting.
Upvotes: -1