Reputation: 1081
I was trying to use Java Scanner hasNext
method, but I got strange results. Maybe my problem is very obvious, but why the this simple simple expression "[a-zA-Z']+"
not working for words like this: "points. anything, supervisor,". I have tried this "[\\w']+"
too.
public HashMap<String, Integer> getDocumentWordStructureFromPath(File file) {
HashMap<String, Integer> dictionary = new HashMap<>();
try {
Scanner lineScanner = new Scanner(file);
while (lineScanner.hasNextLine()) {
Scanner scanner = new Scanner(lineScanner.nextLine());
while (scanner.hasNext("[\\w']+")) {
String word = scanner.next().toLowerCase();
if (word.length() > 2) {
int count = dictionary.containsKey(word) ? dictionary.get(word).intValue() + 1 : 1;
dictionary.put(word, new Integer(count));
}
}
scanner.close();
}
//scanner.useDelimiter(DELIMITER);
lineScanner.close();
return dictionary;
} catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}
}
Upvotes: 0
Views: 7458
Reputation: 539
Your regular expression should be like this [^a-zA-z]+
as you need to separate all the things that are not letters:
// previous code...
Scanner scanner = new Scanner(lineScanner.nextLine()).useDelimiter("[^a-zA-z]+");
while (scanner.hasNext()) {
String word = scanner.next().toLowerCase();
// ...your other code
}
}
// ... after code
EDIT-- Why is not working with the hasNext(String) method??
This line:
Scanner scanner = new Scanner(lineScanner.nextLine());
what it really does is to compile a whitespce pattern for you, so if you have for example this test line "Hello World. A test, ok."
it will deliver you this tokens:
Then if you use scanner.hasNext("[a-ZA-Z]+")
you're asking the scanner if there is a token that match your pattern
, for this example it will state true
for the first token:
For the next token (World.) it doesn't match the pattern
so it will simply fail
and scanner.hasNext("[a-ZA-Z]+")
will return false
so it will never work for words preceded by any character who's not a letter. You get it?
Now... hope this helps.
Upvotes: 1