Reputation: 1082
Im trying to tokenize strings of the following format:
"98, BA71V-CP204L (p32, p30), BA71V-CP204L (p32, p30), , 0, 125900, 126505"
"91, BA71V-B175L, BA71V-B175L, , 0, 108467, 108994, -, 528, 528"
Each of the tokens will then be stored in a string array. The strings are to be tokenized by "," excluding those that are inside ( , ) so that the contents of ( , ) would belong in a token. The tokens may also only contain a space.
Im thinking the reg-ex would find a comma, then check if it is surrounded on the left by a opening parenthesis, and on the right by an closing parenthesis. Since this comma is contained by some ( ), it would not be used to tokenize.
I could have a regex for the opposite, but what about the time where neither sides of the delimiter contain "(" or ")"?
Currently am using:
StringTokenizer tokaniza = new StringTokenizer(content,","); //no regex
but i feel as though regex go better with
content.split();
Upvotes: 0
Views: 92
Reputation: 174706
Use a negative lookahead assertion.
String s = "98, BA71V-CP204L (p32, p30), BA71V-CP204L (p32, p30), , 0, 125900, 126505";
String parts[] = s.split(",(?![^()]*\\))");
System.out.println(Arrays.toString(parts));
Output:
[98, BA71V-CP204L (p32, p30), BA71V-CP204L (p32, p30), , 0, 125900, 126505]
Upvotes: 2
Reputation: 1111
Try a split using:
(?<!\(\w{1,4}),(?!\s*\w*\)).*?
The only thing, Java doesn't support infinite repetitions inside look-behinds you have to specify the number of characters inside the parenthesis (i.e. \w{1,4}). In other words this will break if your characters inside of the parenthesis exceed 4.
Upvotes: 1