Reputation: 470
I have a list of words, say
List<String> words = Arrays.asList("Hello alan i am here where are you"+
"and what are you doing hello are you there");
How can I get the top seven words which are repeated more than once in the list in descending order? And then the single entry words should be arranged in alphabetical order. So the output for the above should be those top seven words
you (3)
are (2)
hello (2)
alan (1)
am (1)
and (1)
doing (1)
I am looking this to do in Java 8 using streams, lamda.
I am trying in this way. First sort the list Second get the map of words with its count of words in the list of words.
List<String> sortedWords = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "))
.stream().sorted().collect(toList());
Map<String, Long> collect =
sortedWords.stream().collect(groupingBy(Function.identity(), counting()));
Upvotes: 4
Views: 6518
Reputation: 6686
Sometimes the best solution to a problem is not an algorithm, but a data structure. I think what you need here is a Bag. Since you want the output sorted by occurrences and then by key, the specific data structure you should use is a TreeBag
. The following code will work using Eclipse Collections with Java 8 Streams:
String string =
"Hello alan i am here where are you and what are you doing hello are you there";
List<ObjectIntPair<String>> pairs =
Stream.of(string.toLowerCase().split(" "))
.collect(Collectors.toCollection(TreeBag::new))
.topOccurrences(7);
System.out.println(pairs);
This code will output:
// Strings with occurrences
[are:3, you:3, hello:2, alan:1, am:1, and:1, doing:1, here:1, i:1, there:1, what:1, where:1]
The topOccurrences()
method has logic to deal with ties, which basically leaves it to the developer to determine how they want to handle situations with ties. If you want exactly the first seven items from this list, you could then chain a call to .take(7);
The code can also be further simplified to:
List<ObjectIntPair<String>> pairs =
TreeBag.newBagWith(string.split(" ")).topOccurrences(7);
System.out.println(pairs);
The static factory method TreeBag.newBagWith()
accepts a vararg argument so you can just pass the result of String.split()
directly to it.
Note: I am a committer for Eclipse Collections.
Upvotes: 2
Reputation: 15706
Two-step solution: group/count, then process by count descending
List<String> words = Arrays.asList("Hello alan i am here where are you and what are you doing hello you there".split(" "));
Map<String, Long> collect = words.stream()
.map(String::toLowerCase) // convert to lower case
.collect( // group and count by name
Collectors.groupingBy(Function.identity(), Collectors.counting()));
collect.keySet().stream()
.sorted( // order by count descending, then by name
Comparator
.comparing(collect::get)
.reversed()
.thenComparing(Collator.getInstance()))
.map(k -> k + " (" + collect.get(k) + ")") // map to name and count string
.limit(7) // only first 7 entries
.forEach(System.out::println); // output
Upvotes: 0
Reputation: 137114
The most difficult part is the sorting. Since you want to keep only the 7 first elements from the result and you want to sort the Map by its values, we need to create a Map of all the result, sort it and then keep the 7 results.
In the following code, every word is lower-cased and grouped by themselves, counting the number of occurences. Then, we need to sort this map so we create a Stream over the entries, sort them according to the values (in descending order) and then according to the keys. The 7 first elements are retained, mapped to their key (which correspond to the word) and collected into a List
, thus keeping encounter order.
public static void main(String[] args) {
String sentence = "Hello alan i am here where are you and what are you doing hello are you there";
List<String> words = Arrays.asList(sentence.split(" "));
List<String> result =
words.stream()
.map(String::toLowerCase)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder()).thenComparing(Map.Entry.comparingByKey()))
.limit(7)
.map(Map.Entry::getKey)
.collect(toList());
System.out.println(result);
}
Output:
[are, you, hello, alan, am, and, doing]
Note that you made a mistake in your wanted output: "are"
actually appears 3 times like "you"
so it should be before
NB: this code assumes a lot of static imports, namely:
import static java.util.Comparator.reverseOrder;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toList;
Upvotes: 11
Reputation: 100249
While @Tunaki solution is great, it's interesting that using my StreamEx library, it's possible to solve the problem in single Stream pipeline (no actual operation will be performed until the single terminal operation is called):
Map<String, Long> map = StreamEx.of(words)
.map(String::toLowerCase)
.sorted() // sort original words, so now repeating words are next to each other
.runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
.sorted(Entry.<String, Long> comparingByValue().reversed()
.thenComparing(Entry.comparingByKey()))
.limit(7) // Sort and limit
.toCustomMap(LinkedHashMap::new); // Single terminal operation: store to LinkedHashMap
Or if only words are needed:
List<String> list =StreamEx.of(words)
.map(String::toLowerCase)
.sorted() // sort original words, so now repeating words are next to each other
.runLengths() // StreamEx feature: squash repeating words into Entry<String, Long>
.sorted(Entry.<String, Long> comparingByValue().reversed()
.thenComparing(Entry.comparingByKey()))
.limit(7) // Sort and limit
.keys() // Drop counts leaving only words
.toList(); // Single terminal operation: store to List
Upvotes: 4
Reputation: 166
I'm a simple person, so I'd use a Map<String, Integer>
to count each word first.
Then create a TreeSet
for each of the counts, and store those in a TreeMap<Integer, TreeSet>
. Should be rather straightforward from there.
Upvotes: 1