Reputation: 7585
In Java, on a text like foo <on> bar </on> thing <on> again</on> now
, I should want a regex with groups wich give me with a find "foo", "bar", empty string, then "thing", "again", "now".
If I do (.*?)<on>(.*?)</on>(?!<on>)
, I get only two group (foo bar, thing again, and I've not the end "now").
if I do (.*?)<on>(.*?)</on>((?!<on>))
I get foo bar empty string, then thing again and empty string (here I should want "now").
Please what is the magical formula ?
Thanks.
Upvotes: 2
Views: 260
Reputation: 383716
If you insist on doing this with regex, then you can try to use \s*<[^>]*>\s*
as delimiter:
String text = "foo <on> bar </on> thing <on> again</on> now";
String[] parts = text.split("\\s*<[^>]*>\\s*");
System.out.println(java.util.Arrays.toString(parts));
// "[foo, bar, thing, again, now]"
I'm not sure if this is exactly what you need, because it's not exactly clear.
Perhaps something like this was required:
String text = "1<on>2</on>3<X>4</X>5<X>6</X>7<on>8</on><X>9</X>10";
String[] parts = text.split("\\s*</?on>\\s*|<[^>]*>[^>]*>");
System.out.println(java.util.Arrays.toString(parts));
// prints "[1, 2, 3, 5, 7, 8, , 10]"
This doesn't handle nested tags. If you have those, you'd really want to dump regex and use an actual HTML parser.
If you don't want the empty string in the middle of the array, then just (?:delimiter)+
.
String text = "1<on>2</on>3<X>4</X>5<X>6</X>7<on>8</on><X>9</X>10";
String[] parts = text.split("(?:\\s*</?on>\\s*|<[^>]*>[^>]*>)+");
System.out.println(java.util.Arrays.toString(parts));
// prints "[1, 2, 3, 5, 7, 8, 10]"
Upvotes: 2
Reputation: 36329
My recommendations
<on>
and after </on>
<on>
and next </on>
Matcher.find()
to sequence through all occurences, if possible. No need to do all at once with one big fat regexp!Upvotes: 0