Luke Dolamore
Luke Dolamore

Reputation: 31

Word count returns 1 for empty file

This code is for counting words in the input. It works except when no words are in the input - it returns 1 and not 0. What is wrong here?

import java.util.Scanner;

public class Exercise12 {
    public static void main(String[] args) {
         Scanner kb = new Scanner(System.in);
         System.out.println("Input, Stop by #");
         String input = kb.nextLine();
         while (! input.equals("#")) {
             wordCount(input);
             input = kb.nextLine();             
            }
    } //main

    public static void wordCount(String countSpace) {
        int count = 1;
        for (int i =0; i < countSpace.length(); i++ ) {
            if ((countSpace.charAt(i)) == ' ') {
                count++;
            }
        }
        System.out.println(count);      
    }
} // class Exercise12

Upvotes: 1

Views: 441

Answers (4)

J&#233;r&#244;me
J&#233;r&#244;me

Reputation: 1254

To get everything right you should trim() your String to remove leading and trailing whitespaces. Then split the String at whitespace and count all non empty Strings. Empty Strings are caused by consecutive whitespaces.

Use Java 8:

public static void wordCount(String countSpace) {
    System.out.println(Arrays.stream(countSpace.trim().split(" ")).filter(word->!word.isEmpty()).count());
}

Upvotes: 3

Aleks G
Aleks G

Reputation: 57316

TL;DR: Use StringTokenizer:

public static void wordCount(String input) {
    int count = new java.util.StringTokenizer(input).countTokens();
    System.out.println(count);      
}

Long explanation:

Your code is almost correct, however you initialise your count to 1. Then you increment it for every space character that you find. At the end of the input you do not have a space, thus you do not increment the count for the last word - and this compensates you starting with 1 and not 0. Yet, in case of empty input, you start with 1 and there's nothing to read - therefore you end up with a wrong value.

The first fix is simple: change the initialisation to be int count = 0:

public static void wordCount(String countSpace) {
    int count = 0;
    for (int i =0; i < countSpace.length(); i++ ) {
        if ((countSpace.charAt(i)) == ' ') {
            count++;
        }
    }
    System.out.println(count);      
}

The next problem is that you're not counting words, but rather word separators. What if there are two consecutive spaces between two words? Further, what happens if you encounter end of line or end of file? Your code will break on those.

Ideally, you should use a tokenizer to count your words, but as a minimum, you should count how may times you switched from a space/line-end to an alphanumeric character. Here's an example of using a Tokenizer:

public static void wordCount(String input) {
    int count = new java.util.StringTokenizer(input).countTokens();
    System.out.println(count);      
}

Upvotes: 1

Thanasis1101
Thanasis1101

Reputation: 1680

You could use the split function like this:

public static void wordCount(String countSpace) {

    String[] words = countSpace.split(" ");
    int count = words.length;

    System.out.println(count);  

}

EDIT:

As @Jérôme suggested below, I added the trim function and a check for the empty input and now it works correctly. I also changed the string in the split function to the "\s+" regex, as @Aleks G suggested. Thak you for your corrections. See the updated code below:

public static void wordCount(String countSpace) {

    String[] words = countSpace.trim().split("\\s+");

    int count = 0;
    if (!(words[0].equals(""))){
        count = words.length;
    }        

    System.out.println(count); 
}

Upvotes: 2

user4668606
user4668606

Reputation:

You need to handle the case of empty inputs separately. In addition you should keep in mind that an input might contain two consecutive spaces or spaces at the beginning/end of the line, which shouldn't count for words.

With these special cases, the code would look like this:

public static void wordCount(String in){
    boolean isspace = true;
    int count = 0;
    for(int i = 0; i < in.length(); i++)
    {
        //increment count on start of a word
        if(isspace && in.charAt(i) != ' ')
        {
            count++;
            isspace = false;
        }
        //reset isspace flag on end of a word
        else if(!isspace && in.charAt(i) == ' ')
            isspace = true;
    }

    System.out.println(count);
}

This code makes sure that words are only counted when they are actually encountered and repeated spaces get ignored.

Upvotes: 0

Related Questions