JeyKey
JeyKey

Reputation: 423

How to count words in a File when File has multiple spaces? - Java

I tried to implement the functionality of command "wc file name" in linux. This command counts the number of:

in a file.

Here is my code:

public class wc {
    public static void main(String[] args) throws IOException {
    //counters
    int charsCount = 0;
    int wordsCount = 0;
    int linesCount = 0;

    Scanner in = null;

    try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(new File("Sample.txt"))))){
        File file = new File("Sample.txt");

        while (scanner.hasNextLine()) {

            String tmpStr = scanner.nextLine();
            if (!tmpStr.equalsIgnoreCase("")) {
                String replaceAll = tmpStr.replaceAll("\\s+", "");
                charsCount += replaceAll.length();
                wordsCount += tmpStr.split(" ").length;
            }
            ++linesCount;
        }

    System.out.println("# of chars: " + charsCount);
    System.out.println("# of words: " + wordsCount);
    System.out.println("# of lines: " + linesCount);
    System.out.println("# of bytes: " + file.length());

    }
  }
}

The problem is that in a file there is text like this:

Hex Description                 Hex Description

20  SPACE
21  EXCLAMATION MARK            A1  INVERTED EXCLAMATION MARK
22  QUOTATION MARK              A2  CENT SIGN
23  NUMBER SIGN                 A3  POUND SIGN

There are multiple spaces with different lenghts. Sometimes doubled, sometimes more than that. How to refactor my code to be able to count words properly? How to get rid of multiple spaces?

Upvotes: 2

Views: 577

Answers (3)

nagendra547
nagendra547

Reputation: 6302

@Marvin has already suggested solution here.

This is another way of splitting the strings having multiple spaces.

s.split("[ ]+")

should also work fine for you.

Example

String s="This is     my test    file.";
String s1[]=s.split("[ ]+");
System.out.println(s1.length);

Output:-

5

Upvotes: 0

Marvin
Marvin

Reputation: 14255

String#split accepts a regular expression, so you can simply split on \\s+ (multiple whitspace):

public static void main (String[] args) {
    String input = "Some input  with     more     than   one   space";
    String[] words = input.split("\\s+");
    System.out.println(words.length + " words");
}

Output:

7 words

See on ideone.com.

Upvotes: 4

assylias
assylias

Reputation: 328598

split takes a regex too, so this should work:

tmpStr.split("\\s+")

Upvotes: 0

Related Questions