exe163
exe163

Reputation: 1991

How to use delimiter to isolate words (Java)

I am writing a program that scans text files and then writes each word into a Hashmap.

The Scanner class has a defualt delimiter of space. But I ended up having my words stored with punctuations attached to them. I want the scanner to recognize periods, comas and other types of common punctuations as a sign to stop the token. Here's what I have attempted:

    Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\" \t]+~\\s");

The scanner basically ignored all the spaces even though I have '\\s' as part of the expression. Sorry, but I have hardly any understanding of regex.

Upvotes: 0

Views: 16762

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109597

You might go for no unicode letters:

useDelimiter("[^\\p{L}\\p{M}]+");

([^...] is not, Capital p means Unicode category, L are the letters, M the diacritical combining marks (accents).)

Upvotes: 0

Prince John Wesley
Prince John Wesley

Reputation: 63698

 Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\"\\s]+");

Upvotes: 4

Related Questions