Reputation: 115
I'm trying to split a string by using regex, so far I have
String[] words = a.replaceAll("[^a-zA-Z ]","").toLowerCase().split("\\s+");
And it's almost what I want, but I need to split the text also when there is a newline character in the string (by the way - should I actually use newline or return? What is the actual difference?)
To clarify, my input is:
this is a,
sample of
a file.
After splitting and doing a routine that sorts the words and counts occurrences of each, I should be getting this:
a: 2
file: 1
is: 1
of: 1
sample: 1
this: 1
Instead, I get:
asample: 1
file: 1
is: 1
ofa: 1
this: 1
How should I correct my regular expression to split at newlines as well?
Upvotes: 0
Views: 71
Reputation: 423
Just insert a space in your second argument of the replaceAll method and that should work
replaceAll("[^a-zA-Z ]"," ")
Or you can make it more efficient and avoid unnecessary spaces in the string returned by the replaceAll method by using the '+' quantifier as suggested by Casimir
Both would work just fine in your case
Upvotes: 0
Reputation: 89557
You must change your replaceAll like this:
a.replaceAll("[^a-zA-Z]+"," ")
or as suggested by Alexander why not find directly the words (that is more straight to the point)
Upvotes: 1
Reputation: 2810
Use \b[A-Za-z]+\b
regexp to find the word matches.
http://regexr.com/3ae1c
Upvotes: 2