Reputation: 612
Problem description
I am trying to split a into separate strings, with the split() method that the String class provides. The documentation tells me that it will split around matches of the argument, which is a regular expression. The delimiter that I use is a comma, but commas can also be escaped. Escaping character that I use is a forward slash / (just to make things easier by not using a backslash, because that requires additional escaping in string literals in both Java and the regular expressions).
For instance, the input might be this:
a,b/,b//,c///,//,d///,
And the output should be:
a
b,b/
c/,/
d/,
So, the string should be split at each comma, unless that comma is preceded by an odd number of slashes (1, 3, 5, 7, ..., ∞) because that would mean that the comma is escaped.
Possible solutions
My initial guess would be to split it like this:
String[] strings = longString.split("(?<![^/](//)*/),");
but that is not allowed because Java doesn't allow infinite look-behind groups. I could limit the recurrence to, say, 2000 by replacing the * with {0,2000}:
String[] strings = longString.split("(?<![^/](//){0,2000}/),");
but that still puts constraints on the input. So I decided to take the recurrence out of the look-behind group, and came up with this:
String[] strings = longString.split("(?<!/)(?:(//)*),");
However, its output is the following list of strings:
a
b,b (the final slash is lacking in the output)
c/, (the final slash is lacking in the output)
d/,
Why are those slashes omitted in the 2nd and 3rd string, and how can I solve it (in Java)?
Upvotes: 3
Views: 1562
Reputation: 71538
If you don't mind another method with regex, I suggest using .matcher
:
Pattern pattern = Pattern.compile("(?:[^,/]+|/.)+");
String test = "a,b/,b//,c///,//,d///,";
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group().replaceAll("/(.)", "$1"));
}
Output:
a
b,b/
c/,/
d/,
This method will match everything except the delimiting commas (kind of the reverse). The advantage is that it doesn't rely on lookarounds.
Upvotes: 1
Reputation: 424983
You can achieve the split using a positive look behind for an even number of slashes preceding the comma:
String[] strings = longString.split("(?<=[^/](//){0,999999999}),");
But to display the output you want, you need a further step of removing the remaining escapes:
String longString = "a,b/,b//,c///,//,d///,";
String[] strings = longString.split("(?<=[^/](//){0,999999999}),");
for (String s : strings)
System.out.println(s.replaceAll("/(.)", "$1"));
Output:
a
b,b/
c/,/
d/,
Upvotes: 3
Reputation: 784998
You are pretty close. To overcome lookbehind error you can use this workaround:
String[] strings = longString.split("(?<![^/](//){0,99}/),")
Upvotes: 3
Reputation: 40058
I love regexes, but wouldn't it be easy to write the code manually here, i.e.
boolean escaped = false;
for(int i = 0, len = s.length() ; i < len ; i++){
switch(s.charAt(i)){
case "/": escaped = !escaped; break;
case ",":
if(!escaped){
//found a segment, do something with it
}
//Fallthrough!
default:
escaped = false;
}
}
// handle last segment
Upvotes: 0