Reputation: 423
I tried to implement the functionality of command "wc file name" in linux. This command counts the number of:
in a file.
Here is my code:
public class wc {
public static void main(String[] args) throws IOException {
//counters
int charsCount = 0;
int wordsCount = 0;
int linesCount = 0;
Scanner in = null;
try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(new File("Sample.txt"))))){
File file = new File("Sample.txt");
while (scanner.hasNextLine()) {
String tmpStr = scanner.nextLine();
if (!tmpStr.equalsIgnoreCase("")) {
String replaceAll = tmpStr.replaceAll("\\s+", "");
charsCount += replaceAll.length();
wordsCount += tmpStr.split(" ").length;
}
++linesCount;
}
System.out.println("# of chars: " + charsCount);
System.out.println("# of words: " + wordsCount);
System.out.println("# of lines: " + linesCount);
System.out.println("# of bytes: " + file.length());
}
}
}
The problem is that in a file there is text like this:
Hex Description Hex Description
20 SPACE
21 EXCLAMATION MARK A1 INVERTED EXCLAMATION MARK
22 QUOTATION MARK A2 CENT SIGN
23 NUMBER SIGN A3 POUND SIGN
There are multiple spaces with different lenghts. Sometimes doubled, sometimes more than that. How to refactor my code to be able to count words properly? How to get rid of multiple spaces?
Upvotes: 2
Views: 577
Reputation: 6302
@Marvin has already suggested solution here.
This is another way of splitting the strings having multiple spaces.
s.split("[ ]+")
should also work fine for you.
Example
String s="This is my test file.";
String s1[]=s.split("[ ]+");
System.out.println(s1.length);
Output:-
5
Upvotes: 0
Reputation: 14255
String#split
accepts a regular expression, so you can simply split on \\s+
(multiple whitspace):
public static void main (String[] args) {
String input = "Some input with more than one space";
String[] words = input.split("\\s+");
System.out.println(words.length + " words");
}
Output:
7 words
See on ideone.com.
Upvotes: 4
Reputation: 328598
split
takes a regex too, so this should work:
tmpStr.split("\\s+")
Upvotes: 0