Reputation: 13
I need to be able to turn a string, for instance "This and <those> are."
, into a string array of the form ["This and ", "<those>", " are."]
. I have been trying to using the String.split() command, and I've gotten this regex:
"(?=[<>])"
However, this just gets me ["This and ", "<those", "> are."]
. I can't figure out a good regex to get the brackets all on the same element, and I also can't have spaces between those brackets. So for instance, "This and <hey there> are."
Should be simply split to ["This and <hey there> are."]
. Ideally I'd like to just rely solely on the split command for this operation. Can anyone point me in the right direction?
Upvotes: 0
Views: 46
Reputation: 102978
Not actually possible; given that the 'separator' needs to match 0 characters it needs to be all lookahead/lookbehind, and those require fixed-size lookups; you need to look ahead arbitrarily far into the string to know if a space is going to occur or not, thus, what you want? Impossible.
Just write a regexp that FINDS the construct you want, that's a lot simpler. Simply Pattern.compile("<\\w+>")
(taking a select few liberties on what you intend a thing-in-brackets to look like. If truly it can be ANYTHING except spaces and the closing brace, "<[^ >]+>"
is what you want).
Then, just loop through, finding as you go:
private static final Pattern TOKEN_FINDER = Pattern.compile("<\\w+>");
List<String> parse(String in) {
Matcher m = TOKEN_FINDER.matcher(in);
if (!m.find()) return List.of(in);
var out = new ArrayList<String>();
int pos = 0;
do {
int s = m.start();
if (s > pos) out.add(in.substring(pos, s));
out.add(m.group());
pos = m.end();
} while (m.find());
if (pos < in.length()) out.add(in.substring(pos));
return out;
}
Let's try it:
System.out.println(parse("This and <those> are."));
System.out.println(parse("This and <hey there> are."));
System.out.println(parse("<edgecase>2"));
System.out.println(parse("3<edgecase>"));
prints:
[This and , <those>, are.]
[This and <hey there> are.]
[<edgecase>]
[<edgecase>, 2]
[3, <edgecase>]
seems like what you wanted.
Upvotes: 1