Reputation: 49
I am working on a project to write a program that finds the 10 most used words in a text, but I got stuck and don't know what I should do next. Can someone help me please?
I came this far only:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;
public class Lab4 {
public static void main(String[] args) throws FileNotFoundException {
Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
List<String> words = new ArrayList<String>();
while (file.hasNext()){
String tx = file.next();
// String x = file.next().toLowerCase();
words.add(tx);
}
Collections.sort(words);
// System.out.println(words);
}
}
Upvotes: 3
Views: 29080
Reputation: 40870
Here is an even shorter version than the one from lbalazscs that also uses Java 8's streaming API;
Arrays.stream(new String(Files.readAllBytes(PATH_TO_FILE), StandardCharsets.UTF_8).split("\\W+"))
.collect(Collectors.groupingBy(Function.<String>identity(), HashMap::new, counting()))
.entrySet()
.stream()
.sorted(((o1, o2) -> o2.getValue().compareTo(o1.getValue())))
.limit(10)
.forEach(System.out::println);
This will do everything in one go: Load the file, split by non word characters, group the everything by word and assign word count to each group and then for the top ten word print the words with count.
For some indepth discussion about a very similar setup see also: https://stackoverflow.com/a/33946927/327301
Upvotes: 1
Reputation: 17809
You can use a Guava Multiset, here is a word-counting example: http://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
And here is how to find the words with the highest count in a Multiset: Simplest way to iterate through a Multiset in the order of element frequency?
UPDATE I wrote this answer in 2012. Since then we have Java 8, and now it is possible to find the 10 most used words in a few lines without external libraries:
List<String> words = ...
// map the words to their count
Map<String, Integer> frequencyMap = words.stream()
.collect(toMap(
s -> s, // key is the word
s -> 1, // value is 1
Integer::sum)); // merge function counts the identical words
// find the top 10
List<String> top10 = words.stream()
.sorted(comparing(frequencyMap::get).reversed()) // sort by descending frequency
.distinct() // take only unique values
.limit(10) // take only the first 10
.collect(toList()); // put it in a returned list
System.out.println("top10 = " + top10);
The static imports are:
import static java.util.Comparator.comparing;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;
Upvotes: 10
Reputation: 1
package src;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;
public class ScannerTest
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner(new File("G:/Script_nt.txt")).useDelimiter("[^a-zA-Z]+");
Map<String, Integer> map = new HashMap<String, Integer>();
while (scanner.hasNext())
{
String word = scanner.next();
if (map.containsKey(word))
{
map.put(word, map.get(word)+1);
}
else
{
map.put(word, 1);
}
}
List<Map.Entry<String, Integer>> entries = new ArrayList<Entry<String,Integer>>( map.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
return a.getValue().compareTo(b.getValue());
}
});
for(int i = 0; i < map.size(); i++){
System.out.println(entries.get(entries.size() - i - 1).getKey()+" "+entries.get(entries.size() - i - 1).getValue());
}
}
}
Upvotes: 0
Reputation: 7016
Create in input as a string from file or command line and pass it to below method it will return a map containing words as a key and values as their occurrence or count in that sentence or paragraph.
public Map<String,Integer> getWordsWithCount(String sentances)
{
Map<String,Integer> wordsWithCount = new HashMap<String, Integer>();
String[] words = sentances.split(" ");
for (String word : words)
{
if(wordsWithCount.containsKey(word))
{
wordsWithCount.put(word, wordsWithCount.get(word)+1);
}
else
{
wordsWithCount.put(word, 1);
}
}
return wordsWithCount;
}
Upvotes: -1
Reputation: 7435
Create a map to keep track of occurrences like so:
Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
HashMap<String, Integer> map = new HashMap<>();
while (file.hasNext()){
String word = file.next().toLowerCase();
if (map.containsKey(word)) {
map.put(word, map.get(word) + 1);
} else {
map.put(word, 0);
}
}
ArrayList<Map.Entry<String, Integer>> entries = new ArrayList<>(map.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
return a.getValue().compareTo(b.getValue());
}
});
for(int i = 0; i < 10; i++){
System.out.println(entries.get(entries.size() - i - 1).getKey());
}
Upvotes: 4