Bear
Bear

Reputation: 5152

java split string with regex

I want to split string by setting all non-alphabet as separator.

String[] word_list = line.split("[^a-zA-Z]");

But with the following input

11:11 Hello World

word_list contains many empty string before "hello" and "world"

Please kindly tell me why. Thank You.

Upvotes: 2

Views: 1200

Answers (3)

rayd09
rayd09

Reputation: 1897

Because your regular expression matches each individual non-alpha character. It would be like separating

",,,,,,Hello,World"

on commas.

You will want an expression that matches an entire sequence of non-alpha characters at once such as:

line.split("[^a-zA-Z][^a-zA-Z]*")

I still think you will get one leading empty string with your example since it would be like separating ",Hello,World" if comma were your separator.

Upvotes: 2

Ken Wayne VanderLinde
Ken Wayne VanderLinde

Reputation: 19339

Here's your string, where each ^ character shows a match for [^a-zA-Z]:

11:11 Hello World
^^^^^^     ^

The split method finds each of these matches, and basically returns all substrings between the ^ characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello".

To prevent this, you can manually filter the result to ignore any empty strings.

Upvotes: 2

Chetter Hummin
Chetter Hummin

Reputation: 6817

Will the following do?

String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");

What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.

Upvotes: 0

Related Questions