ary
ary

Reputation: 91

how to stop a java spell checker program from correcting repetitive words

I've implemented a program that does the following:

  1. scan all of the words in a web page into a string (using jsoup)
  2. Filter out all of the HTML markup and code
  3. Put these words into a spell checking program and offer suggestions

The spell checking program loads a dictionary.txt file into an array and compares the string input to the words inside the dictionary.

My current problem is that when the input contains the same word multiple times, such as "teh program is teh worst", the code will print out

You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?

Sometimes a website will have multiple words over and over again and this can become messy.

If it's possible, printing the word along with how many times it was spelled incorrectly would be perfect, but putting a limit to each word being printed once would be good enough.

My program has a handful of methods and two classes, but the spell checking method is below:

Note: the original code contains some 'if' statements that remove punctuation marks but I've removed them for clarity.

static boolean suggestWord;

public static String checkWord(String wordToCheck) {
        String wordCheck;
        String word = wordToCheck.toLowerCase();

    if ((wordCheck = (String) dictionary.get(word)) != null) {
        suggestWord = false; // no need to ask for suggestion for a correct
                                // word.
        return wordCheck;
    }

    // If after all of these checks a word could not be corrected, return as
    // a misspelled word.
    return word;
}

TEMPORARY EDIT: As requested, the complete code:

Class 1:

public class ParseCleanCheck {

        static Hashtable<String, String> dictionary;// To store all the  words of the
        // dictionary
        static boolean suggestWord;// To indicate whether the word is spelled
                                    // correctly or not.

        static Scanner urlInput = new Scanner(System.in);
        public static String cleanString;
        public static String url = "";
        public static boolean correct = true;


        /**
         * PARSER METHOD
         */
        public static void PageScanner() throws IOException {
            System.out.println("Pick an english website to scan.");

            // This do-while loop allows the user to try again after a mistake
            do {
                try {
                    System.out.println("Enter a URL, starting with http://");
                    url = urlInput.nextLine();
                    // This creates a document out of the HTML on the web page
                    Document doc = Jsoup.connect(url).get();
                    // This converts the document into a string to be cleaned
                    String htmlToClean = doc.toString();
                    cleanString = Jsoup.clean(htmlToClean, Whitelist.none());


                    correct = false;
                } catch (Exception e) {
                    System.out.println("Incorrect format for a URL. Please try again.");
                }
            } while (correct);
        }

        /**
         * SPELL CHECKER METHOD
         */
        public static void SpellChecker() throws IOException {
            dictionary = new Hashtable<String, String>();
            System.out.println("Searching for spelling errors ... ");

            try {
                // Read and store the words of the dictionary
                BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));

                while (dictReader.ready()) {
                    String dictInput = dictReader.readLine();
                    String[] dict = dictInput.split("\\s"); // create an array of
                                                            // dictionary words

                    for (int i = 0; i < dict.length; i++) {
                        // key and value are identical
                        dictionary.put(dict[i], dict[i]);
                    }
                }
                dictReader.close();
                String user_text = "";

                // Initializing a spelling suggestion object based on probability
                SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");

                // get user input for correction
                {

                    user_text = cleanString;
                    String[] words = user_text.split(" ");

                    int error = 0;

                    for (String word : words) {
                        if(!dictionary.contains(word)) {
                            checkWord(word);


                            dictionary.put(word, word);
                        }
                        suggestWord = true;
                        String outputWord = checkWord(word);

                        if (suggestWord) {
                            System.out.println("Suggestions for " + word + " are:  " + suggest.correct(outputWord) + "\n");
                            error++;
                        }
                    }

                    if (error == 0) {
                        System.out.println("No mistakes found");
                    }
                }

            } catch (IOException e) {
                e.printStackTrace();
                System.exit(-1);
            }
        }

        /**
         * METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
         * METHOD THROUGH THE "WORD" STRING
         */

        public static String checkWord(String wordToCheck) {
            String wordCheck;
            String word = wordToCheck.toLowerCase();

        if ((wordCheck = (String) dictionary.get(word)) != null) {
            suggestWord = false; // no need to ask for suggestion for a correct
                                    // word.
            return wordCheck;
        }

        // If after all of these checks a word could not be corrected, return as
        // a misspelled word.
        return word;
    }
    }

There is a second class (SuggestSpelling.java) which holds a probability calculator but that isn't relevant right now, unless you planned on running the code for yourself.

Upvotes: 3

Views: 1485

Answers (1)

Kaidul
Kaidul

Reputation: 15885

Use a HashSet to detect duplicates -

Set<String> wordSet = new HashSet<>();

And store each word of the input sentence. If any word already exist during inserting into the HashSet, don't call checkWord(String wordToCheck) for that word. Something like this -

String[] words = // split input sentence into words
for(String word: words) {
    if(!wordSet.contains(word)) {
        checkWord(word);
        // do stuff
        wordSet.add(word);
    }
}

Edit

// ....
{

    user_text = cleanString;
    String[] words = user_text.split(" ");
    Set<String> wordSet = new HashSet<>();

    int error = 0;

    for (String word : words) {
        // wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
        if(!wordSet.contains(word)) {

            // put all your logic here

            wordSet.add(word);
        }
    }

    if (error == 0) {
        System.out.println("No mistakes found");
    }
}
// .... 

You have other bugs as well like you are passing String wordCheck as argument of checkWord and re-declare it inside checkWord() again String wordCheck; which is not right. Please check the other parts as well.

Upvotes: 5

Related Questions