Reputation: 1991
I am writing a program that scans text files and then writes each word into a Hashmap.
The Scanner class has a defualt delimiter of space. But I ended up having my words stored with punctuations attached to them. I want the scanner to recognize periods, comas and other types of common punctuations as a sign to stop the token. Here's what I have attempted:
Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\" \t]+~\\s");
The scanner basically ignored all the spaces even though I have '\\s' as part of the expression. Sorry, but I have hardly any understanding of regex.
Upvotes: 0
Views: 16762
Reputation: 109597
You might go for no unicode letters:
useDelimiter("[^\\p{L}\\p{M}]+");
([^...] is not, Capital p means Unicode category, L are the letters, M the diacritical combining marks (accents).)
Upvotes: 0
Reputation: 63698
Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\"\\s]+");
Upvotes: 4