Arno
Arno

Reputation: 73

Can I use regex to match every third occurrence of a specific character?

I have a string containing some delimited values:

1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2

What I'd like to achieve is a split by every third occurence of a semicolon, so my resulting String[] should contain this:

result[0] = "1.95;1.99;1.78";
result[1] = "10.9;11.45;10.5";
result[2] = "25.95;26;45";
result[3] = "21.2";

So far I've tried several regex solutions, but all I could get to was finding any patterns that are between the semi colons. For example:

(?<=^|;)[^;]*;?[^;]*;?[^;]*

Which matches the values I want, so that makes it impossible to use split() or am I missing something?

Unfortunately I can only supply the pattern used and have no possibility to add some looping through results of the above pattern.

Upvotes: 5

Views: 5400

Answers (4)

Eric B.
Eric B.

Reputation: 24441

Would something like:

 ([0-9.]*;){3}

not work for your needs? The caveat is that there will be a trailing ; at the end of the group. You might be able to tweak the expression to trim that off however.

I just reread your question, and although this simple expression will work for matching groups, if you need to supply it to the split() method, it will unfortunately not do the job.

Upvotes: 0

yjshen
yjshen

Reputation: 6693

String re = "(?<=\\G[^;]*;[^;]*;[^;]*);";
String text = "1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2";
String[] result = Pattern.compile(re).split(text);

Now the result is what you want
Hint: \G in java's regex is a boundary matcher like ^, it means 'end of previous match'

Upvotes: 2

Wacław Borowiec
Wacław Borowiec

Reputation: 700

You can try something like this instead:

String s = "1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2";
Pattern p = Pattern.compile(".*?;.*?;.*?;");
Matcher m = p.matcher(s);
int lastEnd = -1;
while(m.find()){
    System.out.println(m.group());
    lastEnd = m.end();
}
System.out.println(s.substring(lastEnd));

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336418

You are correct. Since Java doesn't support indefinite-length lookbehind assertions (which you need if you want to check whether there are 3, 6, 9 or 3*n values before the current semicolon), you can't use split() for this. Your regex works perfectly with a "find all" approach, but if you can't apply that in your situation, you're out of luck.

In other languages (.NET-based ones, for example), the following regex would work:

;(?<=^(?:[^;]*;[^;]*;[^;]*;)*)

Upvotes: 0

Related Questions