Reputation: 55
I'm trying to use the scanner class to parse all the words in a file. The file contains common text, but I only want to take the words excluding all the puntuation. The solution I have until now is not complete but is already giving me some problem:
Scanner fileScan= new Scanner(file);
String word;
while(fileScan.hasNext("[^ ,!?.]+")){
word= fileScan.next();
this.addToIndex(word, filename);
}
Now if I use this on a sentence like "hi my name is mario!" it returns just "hi", "my", "name" and "is". It's not matching "mario!" (obviously) but it's not matching "mario", like I think it should.
Can you explain why is that and helping me find a better solution if you have one? Thank you
Upvotes: 0
Views: 7137
Reputation: 597224
Since you want to get rid of the punctuation, you can simply replace all punctuation marks before adding to the index:
word = word.replaceAll("\\{Punct}", "");
In the case of hypens, or other isolated punctuation marks, you just check if word.isEmpty()
before adding.
Of course, you'd have to get rid of your custom delimiter.
Upvotes: 0
Reputation: 28761
This works:
import java.util.*;
class S {
public static void main(String[] args) {
Scanner fileScan= new Scanner("hi my name is mario!").useDelimiter("[ ,!?.]+");
String word;
while(fileScan.hasNext()){
word= fileScan.next();
System.out.println(word);
}
} // end of main()
}
javac -g S.java && java S
hi
my
name
is
mario
Upvotes: 5