icelated
icelated

Reputation: 45

Find unique words in a file - Java

Using a msdos window I am piping in an amazon.txt file. I am trying to use the collections framework. Keep in mind I want to keep this as simple as possible. What I want to do is count all the unique words in the file... with no duplicates.

This is what I have so far. Please be kind this is my first java project.

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;

public class project1 {

    // ArrayList<String> a = new ArrayList<String>();

    public static void main(String[] args) {
        Scanner  sc = new Scanner(System.in); 
        String  word;
        String grab;

        int count = 0;
        ArrayList<String> a = new ArrayList<String>();
        // Iterator<String> it = a.iterator();

        System.out.println("Java project\n");

        while (sc.hasNext()) {      
            word = sc.next();  
            a.add(word); 
            if (word.equals("---")) {
            break;
            }
        }

        Iterator<String> it = a.iterator();

        while (it.hasNext()) {
            grab = it.next();

            if (grab.contains("a")) {
                System.out.println(it.next()); // Just a check to see
                count++;
            }
        }
        System.out.println("I counted abc = ");
        System.out.println(count);
        System.out.println("\nbye...");
    }
}

Upvotes: 0

Views: 5356

Answers (2)

Andreas Dolk
Andreas Dolk

Reputation: 114787

In your version, the wordlist a will contain all words but duplicates aswell. You can either

(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution

(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order

Edit

If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).

Upvotes: 9

lins314159
lins314159

Reputation: 2520

Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.

If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

Upvotes: 3

Related Questions