manni
manni

Reputation: 113

java string split regular expression retain delimiter

Give an input string such as

"abbbcaababbbcaaabbca"

I want to split such a string into an array of groups "bca" "ab" "a" and "b"

So the above example would return

"ab", "b", "bca", "ab", "ab", "b", "bca", "a", "ab", "bca".

I have a 29 line piece of code of nested loops that accomplish this task (returns ArrayList). However, it would be nice to get this done with a one line regular expression.

Can this task be accomplished using the following method?

stringVar.split("regEX") 

Upvotes: 1

Views: 131

Answers (3)

hwnd
hwnd

Reputation: 70732

It can be accomplished using lookaround assertions, but @falsetru's answer is preferred over splitting.

String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=a(?=ab))|(?<=b(?=bca))");
System.out.println(Arrays.toString(ss)); //=> [ab, b, bca, ab, ab, b, bca, a, ab, bca]

If the string contains letters only, you could shorten this using a backreference.

String[] ss = "abbbcaababbbcaaabbca".split("(?<=bca|ab)|(?<=(.)(?=\\1))")

Upvotes: 3

Pshemo
Pshemo

Reputation: 124225

It looks like you are trying to split between identical characters. In that case you can use

stringVar.split("(?<=(\\w))(?=\\1)") 

but it will result in ab, b, bca, abab, b, bca, a, ab, bca, which means that abab will not be split.

If you want you can manually add case where you can decide that after ab or bca you also want to split via

stringVar.split("(?<=(\\w))(?=\\1)|(?<=ab|bca)") 

which now will return ab, b, bca, ab, ab, b, bca, a, ab, bca

Upvotes: 1

falsetru
falsetru

Reputation: 369064

Not an one-liner, but you can do it using Matcher.find with a loop.:

ArrayList<String> result = new ArrayList<String>();
String s = "abbbcaababbbcaaabbca";
Matcher m = Pattern.compile("bca|ab|a|b").matcher(s);
while (m.find())
    result.add(m.group());

DEMO

Upvotes: 4

Related Questions