Snowflake
Snowflake

Reputation: 115

Regular expression for splitting a string

I'm trying to split a string by using regex, so far I have

String[] words = a.replaceAll("[^a-zA-Z ]","").toLowerCase().split("\\s+");

And it's almost what I want, but I need to split the text also when there is a newline character in the string (by the way - should I actually use newline or return? What is the actual difference?)

To clarify, my input is:

this is a,
sample of
a file.

After splitting and doing a routine that sorts the words and counts occurrences of each, I should be getting this:

a: 2
file: 1
is: 1
of: 1
sample: 1
this: 1

Instead, I get:

asample: 1
file: 1
is: 1
ofa: 1
this: 1

How should I correct my regular expression to split at newlines as well?

Upvotes: 0

Views: 71

Answers (3)

santiago92
santiago92

Reputation: 423

Just insert a space in your second argument of the replaceAll method and that should work

replaceAll("[^a-zA-Z ]"," ") 

Or you can make it more efficient and avoid unnecessary spaces in the string returned by the replaceAll method by using the '+' quantifier as suggested by Casimir

Both would work just fine in your case

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You must change your replaceAll like this:

 a.replaceAll("[^a-zA-Z]+"," ")

or as suggested by Alexander why not find directly the words (that is more straight to the point)

Upvotes: 1

Alexander Shutau
Alexander Shutau

Reputation: 2810

Use \b[A-Za-z]+\b regexp to find the word matches. http://regexr.com/3ae1c

Upvotes: 2

Related Questions