Reputation: 11651
I have the following string :
bla {{bla {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla
I would like to match
{{bla {{bla bla {{afsaasg}} }} blabla}}
with a regex.
but my regex
{{(.*?)}}
matches
{{bla {{bla bla}}
anyone can help ?
Additional Info : I expect to have not more then 2 brackets at the same time.
Finally I solved this with an own Java fuction. Perhabs this will help someone :
public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {
ArrayList<String> alReturn = new ArrayList<String>();
boolean ok1 = true;
boolean ok2 = true;
int iStartCount = 0;
int iEndCount = 0;
int iStartSearching = 0;
while (ok1) {
int iAnfang = sText.indexOf(sBegin, iStartSearching);
ok2 = true;
if (iAnfang > -1) {
while (ok2) {
int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
int iEndCharacter = sText.indexOf(sEnd, iStartSearching);
if (iEndCharacter == -1) {
// Nothing found . stop
ok2 = false;
ok1 = false;
} else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
// found startpattern
iStartCount = iStartCount + 1;
iStartSearching = iStartCharacter + sBegin.length();
} else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
iEndCount = iEndCount + 1;
iStartSearching = iEndCharacter + sEnd.length();
} else {
if (iStartCharacter < 0) {
// No End found . stop
ok2 = false;
}
}
if (iEndCount == iStartCount) {
// found the pattern
ok2 = false;
// cut
int iEnde = iStartSearching;// +sEnd.length();
String sReturn = sText.substring(iAnfang, iEnde);
alReturn.add(sReturn);
}
}
} else {
ok1 = false;
}
}
return alReturn;
}
I call it:
ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
System.out.println(" sTest : " + alTest.get(0));
Upvotes: 0
Views: 73
Reputation: 476557
You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).
You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".
In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
, and this is supposed to be valid, it will most likely fail.
One sidenote:
Say the level is for instance i levels deep, you can use a regex like:
.*?(.*?\{\{.*?\}\}.*?)*.*?
.*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.
See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.
As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...
Upvotes: 0
Reputation: 249
.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}}
would do what you wanted in C# to any level of nesting, but not Java.
Upvotes: 1
Reputation: 19573
Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it
\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}
Upvotes: 0