Reputation: 5152
I want to split string by setting all non-alphabet as separator.
String[] word_list = line.split("[^a-zA-Z]");
But with the following input
11:11 Hello World
word_list contains many empty string before "hello" and "world"
Please kindly tell me why. Thank You.
Upvotes: 2
Views: 1200
Reputation: 1897
Because your regular expression matches each individual non-alpha character. It would be like separating
",,,,,,Hello,World"
on commas.
You will want an expression that matches an entire sequence of non-alpha characters at once such as:
line.split("[^a-zA-Z][^a-zA-Z]*")
I still think you will get one leading empty string with your example since it would be like separating ",Hello,World"
if comma were your separator.
Upvotes: 2
Reputation: 19339
Here's your string, where each ^
character shows a match for [^a-zA-Z]
:
11:11 Hello World
^^^^^^ ^
The split
method finds each of these matches, and basically returns all substrings between the ^
characters. Since there's six matches before any useful data, you end up with 5 empty substrings before you get the string "Hello"
.
To prevent this, you can manually filter the result to ignore any empty strings.
Upvotes: 2
Reputation: 6817
Will the following do?
String[] word_list = line.replaceAll("[^a-zA-Z ]","").replaceAll(" +", " ").trim().split("[^a-zA-Z]");
What I am doing here is removing all non-alphabet characters before doing the split and then replacing multiple spaces by a single space.
Upvotes: 0