How to count words in Map via Stream

Question

I'm working with List -- it contais a big text. Text looks like:

List lines = Arrays.asList("The first line", "The second line", "Some words can repeat", "The first the second"); //etc

I need to calculate words in it with output:

first - 2
line - 2
second - 2
can - 1
repeat - 1
some - 1
words - 1

Words shorter than 4 symbols should be skipped, that's why "the" and "can" are not in the output. Here I wrote the example, but originally if the word is rare and entry < 20, i should skip this word. Then sort the map by Key in alphabetical order. Using only streams, without "if", "while" and "for" constructions.

What I have implemented:

Map wordCount = Stream.of(list)
                .flatMap(Collection::stream)
                .flatMap(str -> Arrays.stream(str.split("\p{Punct}| |[0-9]|…|«|»|“|„")))
                .filter(str -> (str.length() >= 4))
                .collect(Collectors.toMap(
                        i -> i.toLowerCase(),
                        i -> 1,
                        (a, b) -> java.lang.Integer.sum(a, b))
                );

wordCount contains Map with words and its entries. But how can I skip rare words? Should I create new stream? If yes, how can I get the value of Map? I tried this, but it's not correct:

 String result = Stream.of(wordCount)
         .filter(i -> (Map.Entry::getValue > 10));

My calculations shoud return a String:

"word" - number of entries

Thank you!

Most Noble Rabbit · Accepted Answer

Given the stream that already done:

List lines = Arrays.asList(
        "For the rabbit, it was a bad day.",
        "An Antillean rabbit is very abundant.",
        "She put the rabbit back in the cage and closed the door securely, then ran away.",
        "The rabbit tired of her inquisition and hopped away a few steps.",
        "The Dean took the rabbit and went out of the house and away."
);

Map wordCounts = Stream.of(lines)
        .flatMap(Collection::stream)
        .flatMap(str -> Arrays.stream(str.split("\p{Punct}| |[0-9]|…|«|»|“|„")))
        .filter(str -> (str.length() >= 4))
        .collect(Collectors.toMap(
                String::toLowerCase,
                i -> 1,
                Integer::sum)
        );

System.out.println("Original:" + wordCounts);

Original output:

Original:{dean=1, took=1, door=1, very=1, went=1, away=3, antillean=1, abundant=1, tired=1, back=1, then=1, house=1, steps=1, hopped=1, inquisition=1, cage=1, securely=1, rabbit=5, closed=1}

You can do:

String results = wordCounts.entrySet()
        .stream()
        .filter(wordToCount -> wordToCount.getValue() > 2) // 2 is rare
        .sorted(Map.Entry.comparingByKey()).map(wordCount -> wordCount.getKey() + " - " + wordCount.getValue())
            .collect(Collectors.joining(", "));

System.out.println(results);

Filtered output:

away - 3, rabbit - 5

How to count words in Map via Stream

Answers (2)

Related Questions