Ingen Alls
Ingen Alls

Reputation: 49

How to count the number of occurrences of words in a text

I am working on a project to write a program that finds the 10 most used words in a text, but I got stuck and don't know what I should do next. Can someone help me please?

I came this far only:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;

public class Lab4 {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
        List<String> words = new ArrayList<String>();
        while (file.hasNext()){
            String tx = file.next();
            // String x = file.next().toLowerCase();
            words.add(tx);
        }
        Collections.sort(words);
        // System.out.println(words);
    }
}

Upvotes: 3

Views: 29080

Answers (5)

yankee
yankee

Reputation: 40870

Here is an even shorter version than the one from lbalazscs that also uses Java 8's streaming API;

Arrays.stream(new String(Files.readAllBytes(PATH_TO_FILE), StandardCharsets.UTF_8).split("\\W+"))
            .collect(Collectors.groupingBy(Function.<String>identity(), HashMap::new, counting()))
            .entrySet()
            .stream()
            .sorted(((o1, o2) -> o2.getValue().compareTo(o1.getValue())))
            .limit(10)
            .forEach(System.out::println);

This will do everything in one go: Load the file, split by non word characters, group the everything by word and assign word count to each group and then for the top ten word print the words with count.

For some indepth discussion about a very similar setup see also: https://stackoverflow.com/a/33946927/327301

Upvotes: 1

lbalazscs
lbalazscs

Reputation: 17809

You can use a Guava Multiset, here is a word-counting example: http://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained

And here is how to find the words with the highest count in a Multiset: Simplest way to iterate through a Multiset in the order of element frequency?

UPDATE I wrote this answer in 2012. Since then we have Java 8, and now it is possible to find the 10 most used words in a few lines without external libraries:

List<String> words = ...

// map the words to their count
Map<String, Integer> frequencyMap = words.stream()
         .collect(toMap(
                s -> s, // key is the word
                s -> 1, // value is 1
                Integer::sum)); // merge function counts the identical words

// find the top 10
List<String> top10 = words.stream()
        .sorted(comparing(frequencyMap::get).reversed()) // sort by descending frequency
        .distinct() // take only unique values
        .limit(10)   // take only the first 10
        .collect(toList()); // put it in a returned list

System.out.println("top10 = " + top10);

The static imports are:

import static java.util.Comparator.comparing;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;

Upvotes: 10

Narotam Kumar
Narotam Kumar

Reputation: 1

package src;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;

public class ScannerTest
{
    public static void main(String[] args) throws FileNotFoundException
        {
        Scanner scanner = new Scanner(new File("G:/Script_nt.txt")).useDelimiter("[^a-zA-Z]+");
        Map<String, Integer> map = new HashMap<String, Integer>();
        while (scanner.hasNext())
            {
            String word = scanner.next();
            if (map.containsKey(word))
                {
                map.put(word, map.get(word)+1);
                }
            else
                {
                map.put(word, 1);
                }
            }

        List<Map.Entry<String, Integer>> entries = new ArrayList<Entry<String,Integer>>( map.entrySet());

        Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {

            @Override
            public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
                return a.getValue().compareTo(b.getValue());
            }
        });

        for(int i = 0; i < map.size(); i++){
            System.out.println(entries.get(entries.size() - i - 1).getKey()+" "+entries.get(entries.size() - i - 1).getValue());
        }
        }
}

Upvotes: 0

Rais Alam
Rais Alam

Reputation: 7016

Create in input as a string from file or command line and pass it to below method it will return a map containing words as a key and values as their occurrence or count in that sentence or paragraph.

public Map<String,Integer> getWordsWithCount(String sentances)
{
    Map<String,Integer> wordsWithCount = new HashMap<String, Integer>();

    String[] words = sentances.split(" ");
    for (String word : words)
    {
        if(wordsWithCount.containsKey(word))
        {
            wordsWithCount.put(word, wordsWithCount.get(word)+1);
        }
        else
        {
            wordsWithCount.put(word, 1);
        }

    }

    return wordsWithCount;

}

Upvotes: -1

rtheunissen
rtheunissen

Reputation: 7435

Create a map to keep track of occurrences like so:

   Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
   HashMap<String, Integer> map = new HashMap<>();

   while (file.hasNext()){
        String word = file.next().toLowerCase();
        if (map.containsKey(word)) {
            map.put(word, map.get(word) + 1);
        } else {
            map.put(word, 0);
        }
    }

    ArrayList<Map.Entry<String, Integer>> entries = new ArrayList<>(map.entrySet());
    Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {

        @Override
        public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
            return a.getValue().compareTo(b.getValue());
        }
    });

    for(int i = 0; i < 10; i++){
        System.out.println(entries.get(entries.size() - i - 1).getKey());
    }

Upvotes: 4

Related Questions