Reputation: 39326
I'm stuck trying to get in what sentences each word appears. The entry would be a list of sentences
Question, what kind of wine is best?
White wine.
A question
and the output would be
// format would be: word:{count: sentence1, sentence2,...}
a:{1:3}
wine:{2:1,2}
best:{1:1}
is:{1:1}
kind:{1:1}
of:{1:1}
question:{2:1,3}
what:{1:1}
white:{1:2}
This is what I get so far:
static void getFrequency(List<String> inputLines) {
List<String> list = inputLines.stream()
.map(w -> w.split("[^a-zA-Z0-9]+"))
.flatMap(Arrays::stream)
.map(String::toLowerCase)
.collect(Collectors.toList());
Map<String, Integer> wordCounter = list.stream()
.collect(Collectors.toMap(w -> w, w -> 1, Integer::sum));
}
With that I'm only getting the count of times each word appears in all the sentences, but I need to get also the list of sentences where the word appears. It looks like maybe to get the id of sentences I can use IntStream.range
, something like this:
IntStream.range(1, inputLines.size())
.mapToObj(i -> inputLines.get(i));
But I'm not sure if that is the best way to do it, I'm new with Java
Upvotes: 2
Views: 258
Reputation: 45309
You can use a grouping collector to compute a word to index list map. Here's an example:
private static Map<String, List<Integer>> getFrequency(List<String> inputLines) {
return IntStream.range(0, inputLines.size())
.mapToObj(line -> Arrays.stream(inputLines.get(line)
.split("[^a-zA-Z0-9]+"))
.map(word -> new SimpleEntry<>(word.toLowerCase(), line + 1)))
.flatMap(Function.identity())
.collect(Collectors.groupingBy(Entry::getKey,
Collectors.mapping(Entry::getValue, Collectors.toList())));
}
With your test data, I get
{a=[3], what=[1], white=[2], question=[1, 3], kind=[1],
of=[1], best=[1], is=[1], wine=[1, 2]}
The count is easy to infer from the list size, so there should be no need for an additional class.
Upvotes: 9