Niet the Dark Absol
Niet the Dark Absol

Reputation: 324620

Recursive regex help needed

I'm making a template system whereby I can type:

<t:category:item:parameter1:parameter2...>

And have it be replaced with text from a file. The optional parameters are placed in the replacement string as %1, %2...

So far I have this:

$data = preg_replace_callback("/<t:([^:]+):([^>]+)>/",function($m) use (&$userdata) {
    static $fcache = Array();
    $parse = function($file) use (&$fcache,&$lang) {
        // parse the file if it's a new one. False indicates success, otherwise error message is returned
        if( !isset($fcache[$file])) {
            if( !file_exists("text/".$lang."/".$file.".html")) $lang = "en";
            if( !file_exists("text/".$lang."/".$file.".html")) return "<div class=\"alert\">ERROR: File ".$file." not found.</div>";
            $k = "";
            foreach(file("text/".$lang."/".$file.".html") as $l) {
                if( substr($l,0,1) == "|") $k = rtrim(substr($l,1));
                else $fcache[$file][$k] .= $l;
            }
        }
        return false;
    };
    $lang = $userdata && $userdata['language'] ? $userdata['language'] : "uk";
    list(,$file,$d) = $m;
    $params = explode(":",$d);
    $section = array_shift($params);
    if( $e = $parse($file)) return $e;
    if( !$fcache[$file][$section]) {
        $lang = "uk";
        if( $e = $parse($file)) return $e;
    }
    return preg_replace_callback("/%(\d+)/",function($i) use ($params) {
        return htmlspecialchars_decode($params[$i[1]-1]);
    },trim($fcache[$file][$section]));
},$data);

The format of the text file is:

|key
replacement text
|otherkey
more text %1

Anyway, getting to the point: What if one of the parameters is itself a replacement string? For instance, what if I want a string like "Come and visit him soon!" - I'd like to have it be something like:

<t:person:visit:<t:grammar:pronoun_object_m>>

And the file would have:

|visit
Come and visit %1 soon!

|pronoun_object_m
him

However, the current function will take the parameter as the literal <t:grammar:pronoun_object_m and there would be an extra > showing up at the end of the phrase:

Come and visit <t:grammar:pronoun_object_m soon!>

Which would actually show up as:

Come and visit

due to the unparsed replacement looking like an HTML tag...

I'm fairly sure I need a recursive regex, however I am very confused as to how they work. Could anyone please explain how I could "recursify" my regex to allow for embedded parameters like this?

Upvotes: 1

Views: 143

Answers (1)

Martin Ender
Martin Ender

Reputation: 44259

The problem with recursive solutions is, they don't work really well with preg_replace. They are mostly intended for preg_match. The reason is that you will only be able to access the last (innermost) capturing of a pattern that is reused within the recursion. So even preg_replace_callback will not help much here.

Here is another possibility:

In <t:person:visit:<t:grammar:pronoun_object_m>>, the reason you get the output you mentioned is, that your regex will match this:

<t:person:visit:<t:grammar:pronoun_object_m>

(It cannot go further, because you disallow > within your placeholders.)

There are a few ways to get around this. For starters you could also disallow < (and not only >) within your placeholders:

"/<t:([^:]+):([^<>]+)>/"

Now your pattern will always only find the innermost placeholders. So you could simply call your preg_replace_callback repeatedly until no more replacements are done. How to find this out? Add the optional fourth and fifth parameter:

do
{
    preg_replace_callback("/<t:([^:]+):([^<>]+)>/", $function, $data, -1, $count);
} while($count);

I also suggest (for legibility) that you define the callback outside of the preg_replace_callback function.

Upvotes: 2

Related Questions