user2412629
user2412629

Reputation:

How to create loop with regular expression?

Honestly, I think I should first ask for your help with syntax of this question first.

But please if you can understand what I mean edit the title with suitable one.

Is there a way to make pattern that can split a text like this.

{{START}}
    {{START}}
        {{START}}
            {{START}}
            {{END}}
        {{END}}
    {{END}}
{{END}}

So every {{START}} matches its {{END}} from inside first to outside last!

And if I cannot do that with regex only. What about doing it using PHP?

Thank you up front.

Upvotes: 5

Views: 3098

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

It is possible! You can have each level of content using a recursive regular expression:

$data = <<<LOD
{{START1}}
    aaaaa
    {{START2}}
        bbbbb
        {{START3}}
            ccccc
            {{START4}}
                ddddd
            {{END4}}
        {{END3}}
    {{END2}}
{{END1}}
LOD;

$pattern = '~(?=({{START\d+}}(?>[^{]++|(?1))*{{END\d+}}))~';
preg_match_all ($pattern, $data, $matches);

print_r($matches);

explanations:

part: ({{START\d+}}(?>[^{]++|(?1))*{{END\d+}})

This part of the pattern describe a nested structure with {{START#}} and {{END#}}

(             # open the first capturing group
{{START\d+}}  
(?>           # open an atomic group (= backtracks forbidden)
    [^{]++    # all that is not a { one or more times (possessive)
  |           # OR
    (?1)      # refer to the first capturing group itself
)             # close the atomic group
{END\d+}}     # 
)             # close the first capturing group

Now the problem is that you can't capture all the level with this part only, because all the characters of the string are consumed by the pattern. In other words you can't match overlapped parts of the string.

The issue is to wrap all this part inside a zero-width assertion which doesn't consume characters like a lookahead (?=...), result:

(?=({{START\d+}}(?>[^{]++|(?1))*{{END\d+}}))

This will match all the levels.

Upvotes: 4

tylerl
tylerl

Reputation: 30867

This is beyond the capability of a regular expression, which can only parse regular grammars. What you're describing would require a pushdown automaton (regular languages are defined by a regular automaton).

You can use regular expression to parse the individual elements, but the "depth" part needs to be handled by a a language with a concept of memory (PHP is fine for this).

So in your solution, regexes will just be used for identifying your tags, while the real logic as to tracking depth and determining which element the END tag belongs to will must be your program itself.

Upvotes: 4

Related Questions