mcfly soft
mcfly soft

Reputation: 11651

regex expression for nested structures

I have the following string :

bla {{bla  {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla

I would like to match

{{bla  {{bla bla {{afsaasg}} }} blabla}}

with a regex.

but my regex

{{(.*?)}}

matches

{{bla  {{bla bla}}

anyone can help ?

Additional Info : I expect to have not more then 2 brackets at the same time.

Finally I solved this with an own Java fuction. Perhabs this will help someone :

public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {

        ArrayList<String> alReturn = new ArrayList<String>();

        boolean ok1 = true;
        boolean ok2 = true;

        int iStartCount = 0;
        int iEndCount = 0;

        int iStartSearching = 0;

        while (ok1) {
            int iAnfang = sText.indexOf(sBegin, iStartSearching);

            ok2 = true;
            if (iAnfang > -1) {
                while (ok2) {

                    int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
                    int iEndCharacter = sText.indexOf(sEnd, iStartSearching);

                    if (iEndCharacter == -1) {
                        // Nothing found . stop
                        ok2 = false;
                        ok1 = false;

                    } else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
                        // found startpattern
                        iStartCount = iStartCount + 1;
                        iStartSearching = iStartCharacter + sBegin.length();
                    } else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
                        iEndCount = iEndCount + 1;
                        iStartSearching = iEndCharacter + sEnd.length();

                    } else {
                        if (iStartCharacter < 0) {
                            // No End found . stop
                            ok2 = false;
                        }
                    }
                    if (iEndCount == iStartCount) {
                        // found the pattern
                        ok2 = false;
                        // cut
                        int iEnde = iStartSearching;// +sEnd.length();
                        String sReturn = sText.substring(iAnfang, iEnde);
                        alReturn.add(sReturn);
                    }
                }
            } else {
                ok1 = false;
            }
        }

        return alReturn;
    }

I call it:

    ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
    System.out.println(" sTest : " + alTest.get(0));

Upvotes: 0

Views: 73

Answers (3)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476557

You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).

You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".

In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}, and this is supposed to be valid, it will most likely fail.

One sidenote:

Say the level is for instance i levels deep, you can use a regex like:

  • for 1: .*?(.*?\{\{.*?\}\}.*?)*.*?
  • for 2: .*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
  • ...

But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.

See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.

As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...

Upvotes: 0

user1906580
user1906580

Reputation: 249

.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} would do what you wanted in C# to any level of nesting, but not Java.

Upvotes: 1

chiliNUT
chiliNUT

Reputation: 19573

Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it

\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}

Upvotes: 0

Related Questions