Reputation: 790
Pls I want to keep a count of every word from a file, and this count should not include non letters like the apostrophe, comma, fullstop, question mark, exclamation mark, e.t.c. i.e just letters of the alphabet. I tried to use a delimiter like this, but it didn't include the apostrophe.
Scanner fileScanner = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
int totalWordCount = 0;
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
fileScanner.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
//Then later I create an array to store each individual word in the file for counting their lengths.
Scanner fileScanner2 = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
String[] words = new String[totalWordCount];
for (int i = 0; i < totalWordCount; ++i) {
words[i] = fileScanner2.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
}
This doesn't seem to work !
Please how can I go about this ?
Upvotes: 2
Views: 4764
Reputation: 1248
You could try this regex in your delimiter:
fileScanner.useDelimiter(("[^a-zA-Z]|[^\']")).next();
This will use any non-letter character OR non apostrophe as a delimiter. That way your words will include the apostrophe but not any other non-letter character.
Then you'll have to loop through each word and check for apostrophe's and account for them if you want the length to be accurate. You could just remove each apostrophe and the length will match the number of letters in the word, or you could create word objects with their own length fields, so that you can print the word as is, and know the number of letter characters in that word.
Upvotes: 0
Reputation: 813
Seems to me that you don't want to filter using anything but spaces and end lines. For example the word "they're" would return as two words if you're using a ' to filter your number of words. Here's how you could change your original code to make it work.
Scanner fileScanner = new Scanner(new File("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt"));
int totalWordCount = 0;
ArrayList<String> words = new ArrayList<String>();
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
//Add words to an array list so you only have to go through the scanner once
words.add(fileScanner.next());//This defaults to whitespace
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
fileScanner.close();
Using the Pattern.compile()
turns your string into a regular expression. The '\s' character is predefined in the Pattern class to match all white space characters.
There is more information at Pattern Documentation
Also, make sure to close your Scanner classes when you're done. This could prevent your second scanner from opening.
Edit
If you want to count the letters per word you can add the following code to the above code
int totalLetters = 0;
int[] lettersPerWord = new int[words.size()];
for (int wordNum = 0; wordNum < words.size(); wordNum++)
{
String word = words.get(wordNum);
word = word.replaceAll("[.,:;()?!\" \t\n\r\']+", "");
lettersPerWord[wordNum] = word.length();
totalLetters = word.length();
}
I have tested this code and it appears to work for me. The replaceAll
, according to the JavaDoc uses a regular expression to match so it should match any of those characters and essentially remove it.
Upvotes: 2
Reputation: 9491
The Delimiter is not a regular expression, so with your example it is looking for things split between "[.,:;()?!\" \t\n\r]+"
You can either use regexp instead of the Delimiter
using the regexp class with the group method may be what your looking for.
String pattern = "(.*)[.,:;()?!\" \t\n\r]+(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(test);
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
Play with those classes and you will see it is much more similar to what you need
Upvotes: 1