Martin Petkov
Martin Petkov

Reputation: 420

Java String split regexp returns empty strings with multiple delimiters

I have a problem that I can't seem to find an answer here for, so I'm asking it.

The thing is that I have a string and I have delimiters. I want to create an array of strings from the things which are between those delimiters (might be words, numbers, etc). However, if I have two delimiters next to one another, the split method will return an empty string for one of the instances.

I tested this against even more delimiters that are in succession. I found out that if I have n delimiters, I will have n-1 empty strings in the result array. In other words, if I have both "," and " " as delimiters, and the sentence "This is a very nice day, isn't it", then the array with results would be like:

{... , "day", "", "isn't" ...}

I want to get those extra empty strings out and I can't figure out how to do that. A sample regex for the delimiters that I have is:

"[\\s,.-\\'\\[\\]\\(\\)]"

Also can you explain why there are extra empty strings in the result array?

P.S. I read some of the similar posts which included information about the second parameter of the regex. I tried both negative, zero, and positive numbers, and I didn't get the result that I'm looking for. (one of the questions had an answer saying that -1 as a parameter might solve the problem, but it didn't.

Upvotes: 2

Views: 1419

Answers (4)

MChaker
MChaker

Reputation: 2649

If you want to get rid of empty strings, you can use the Guava project Splitter class.

on method:

Returns a splitter that uses the given fixed string as a separator.

Example (ignoring empty strings):

System.out.println(
                Splitter.on(',')
                   .trimResults()
                   .omitEmptyStrings()
                   .split("foo,bar,,   qux")
                );

Output:

[foo, bar, qux]

onPattern method:

Returns a splitter that considers any subsequence matching a given pattern (regular expression) to be a separator.

Example (ignoring empty strings):

System.out.println(
                Splitter
                .onPattern("([,.|])")
                .trimResults()
                .omitEmptyStrings()
                .split("foo|bar,,  qux.hi")
                );

Output:

[foo, bar, qux, hi]

For more details, consult Splitter documentation.

Upvotes: 0

isnot2bad
isnot2bad

Reputation: 24444

Your regular expression describes just one single character. If you want it to match multiple separators at once, use a quantifier:

String s = "This is a very nice day, isn't it";
String[] tokens = s.split("[\\s,.\\-\\[\\]()']+");

(Note the '+' at the end of the expression)

Upvotes: 1

anubhava
anubhava

Reputation: 785098

You can use this regex for splitting:

[\\s,.'\\[\\]()-]+
  • Keep unescaped hyphen at first or last position in character class otherwise it is treated as range like A-Z or 0-9
  • You must use quantifier + for matching 1 more delimiters

Upvotes: 1

vojta
vojta

Reputation: 5651

I think your problem is just the regex itself. You should use a greedy quantifier:

"[\\s,.-\\'\\[\\]\\(\\)]+"

See http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum

X+ ... X, one or more times

Upvotes: 1

Related Questions