bhupen
bhupen

Reputation: 470

Java 8 - Count of words and then arrange in desc order

I have a list of words, say

List<String> words = Arrays.asList("Hello alan i am here where are you"+  
  "and what are you doing hello are you there");

How can I get the top seven words which are repeated more than once in the list in descending order? And then the single entry words should be arranged in alphabetical order. So the output for the above should be those top seven words

you (3)
are (2)
hello (2)
alan (1)
am (1)
and (1)
doing (1)

I am looking this to do in Java 8 using streams, lamda.

I am trying in this way. First sort the list Second get the map of words with its count of words in the list of words.

List<String> sortedWords = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "))
            .stream().sorted().collect(toList());

Map<String, Long> collect = 
            sortedWords.stream().collect(groupingBy(Function.identity(), counting()));

Upvotes: 4

Views: 6518

Answers (5)

Donald Raab
Donald Raab

Reputation: 6686

Sometimes the best solution to a problem is not an algorithm, but a data structure. I think what you need here is a Bag. Since you want the output sorted by occurrences and then by key, the specific data structure you should use is a TreeBag. The following code will work using Eclipse Collections with Java 8 Streams:

String string =
    "Hello alan i am here where are you and what are you doing hello are you there";
List<ObjectIntPair<String>> pairs =
    Stream.of(string.toLowerCase().split(" "))
        .collect(Collectors.toCollection(TreeBag::new))
        .topOccurrences(7);
System.out.println(pairs);

This code will output:

// Strings with occurrences
[are:3, you:3, hello:2, alan:1, am:1, and:1, doing:1, here:1, i:1, there:1, what:1, where:1]

The topOccurrences() method has logic to deal with ties, which basically leaves it to the developer to determine how they want to handle situations with ties. If you want exactly the first seven items from this list, you could then chain a call to .take(7);

The code can also be further simplified to:

List<ObjectIntPair<String>> pairs =
    TreeBag.newBagWith(string.split(" ")).topOccurrences(7);
System.out.println(pairs);

The static factory method TreeBag.newBagWith() accepts a vararg argument so you can just pass the result of String.split() directly to it.

Note: I am a committer for Eclipse Collections.

Upvotes: 2

Peter Walser
Peter Walser

Reputation: 15706

Two-step solution: group/count, then process by count descending

List<String> words = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "));

Map<String, Long> collect = words.stream()
        .map(String::toLowerCase) // convert to lower case
        .collect( // group and count by name
                Collectors.groupingBy(Function.identity(), Collectors.counting()));

collect.keySet().stream()
        .sorted( // order by count descending, then by name
                Comparator
                        .comparing(collect::get)
                        .reversed()
                        .thenComparing(Collator.getInstance()))
        .map(k -> k + " (" + collect.get(k) + ")") // map to name and count string
        .limit(7) // only first 7 entries
        .forEach(System.out::println); // output

Upvotes: 0

Tunaki
Tunaki

Reputation: 137114

The most difficult part is the sorting. Since you want to keep only the 7 first elements from the result and you want to sort the Map by its values, we need to create a Map of all the result, sort it and then keep the 7 results.

In the following code, every word is lower-cased and grouped by themselves, counting the number of occurences. Then, we need to sort this map so we create a Stream over the entries, sort them according to the values (in descending order) and then according to the keys. The 7 first elements are retained, mapped to their key (which correspond to the word) and collected into a List, thus keeping encounter order.

public static void main(String[] args) {
    String sentence = "Hello alan i am here where are you and what are you doing hello are you there";
    List<String> words = Arrays.asList(sentence.split(" "));

    List<String> result = 
            words.stream()
                 .map(String::toLowerCase)
                 .collect(groupingBy(identity(), counting()))
                 .entrySet().stream()
                 .sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder()).thenComparing(Map.Entry.comparingByKey()))
                 .limit(7)
                 .map(Map.Entry::getKey)
                 .collect(toList());

    System.out.println(result);
}

Output:

[are, you, hello, alan, am, and, doing]

Note that you made a mistake in your wanted output: "are" actually appears 3 times like "you" so it should be before

NB: this code assumes a lot of static imports, namely:

import static java.util.Comparator.reverseOrder;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toList;

Upvotes: 11

Tagir Valeev
Tagir Valeev

Reputation: 100249

While @Tunaki solution is great, it's interesting that using my StreamEx library, it's possible to solve the problem in single Stream pipeline (no actual operation will be performed until the single terminal operation is called):

Map<String, Long> map = StreamEx.of(words)
    .map(String::toLowerCase)
    .sorted() // sort original words, so now repeating words are next to each other
    .runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
    .sorted(Entry.<String, Long> comparingByValue().reversed()
                 .thenComparing(Entry.comparingByKey()))
    .limit(7) // Sort and limit
    .toCustomMap(LinkedHashMap::new); // Single terminal operation: store to LinkedHashMap

Or if only words are needed:

List<String> list =StreamEx.of(words)
    .map(String::toLowerCase)
    .sorted() // sort original words, so now repeating words are next to each other
    .runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
    .sorted(Entry.<String, Long> comparingByValue().reversed()
                 .thenComparing(Entry.comparingByKey()))
    .limit(7) // Sort and limit
    .keys() // Drop counts leaving only words
    .toList(); // Single terminal operation: store to List

Upvotes: 4

Robert Benson
Robert Benson

Reputation: 166

I'm a simple person, so I'd use a Map<String, Integer> to count each word first. Then create a TreeSet for each of the counts, and store those in a TreeMap<Integer, TreeSet>. Should be rather straightforward from there.

Upvotes: 1

Related Questions