Return count and list of sentences where word appears using Java Streams

Question

I'm stuck trying to get in what sentences each word appears. The entry would be a list of sentences

Question, what kind of wine is best? 
White wine.
A question

and the output would be

// format would be: word:{count: sentence1, sentence2,...}
a:{1:3} 
wine:{2:1,2} 
best:{1:1} 
is:{1:1} 
kind:{1:1} 
of:{1:1} 
question:{2:1,3} 
what:{1:1}
white:{1:2}

This is what I get so far:

static void getFrequency(List inputLines) {
  List list = inputLines.stream()
     .map(w -> w.split("[^a-zA-Z0-9]+"))
     .flatMap(Arrays::stream)
     .map(String::toLowerCase)
     .collect(Collectors.toList());

   Map wordCounter = list.stream()
     .collect(Collectors.toMap(w -> w, w -> 1, Integer::sum));
}

With that I'm only getting the count of times each word appears in all the sentences, but I need to get also the list of sentences where the word appears. It looks like maybe to get the id of sentences I can use IntStream.range, something like this:

 IntStream.range(1, inputLines.size())
          .mapToObj(i -> inputLines.get(i));

But I'm not sure if that is the best way to do it, I'm new with Java

ernest_k · Accepted Answer

You can use a grouping collector to compute a word to index list map. Here's an example:

private static Map> getFrequency(List inputLines) {
    return IntStream.range(0, inputLines.size())
            .mapToObj(line -> Arrays.stream(inputLines.get(line)
                 .split("[^a-zA-Z0-9]+"))
                 .map(word -> new SimpleEntry<>(word.toLowerCase(), line + 1)))
            .flatMap(Function.identity())
            .collect(Collectors.groupingBy(Entry::getKey, 
                  Collectors.mapping(Entry::getValue, Collectors.toList())));
}

With your test data, I get

{a=[3], what=[1], white=[2], question=[1, 3], kind=[1], 
 of=[1], best=[1], is=[1], wine=[1, 2]}

The count is easy to infer from the list size, so there should be no need for an additional class.

Return count and list of sentences where word appears using Java Streams

Answers (1)

Related Questions