ldiqual
ldiqual

Reputation: 15375

Regex: there's a regex inside

I'm falling deeper into the regex's dark side. I need to parse this:

{{word(a|b|c)|word$1}}
{{word(s?)|word$1}}
{{w(a|b|c)ord(s?)|w$1ord$2}}

As you may have noticed, it is a search & replace scheme, containing regular expressions. The wikimedia engine does it very well, but I couldn't find how it does: right here.

I just need to get the first part, and the second part into two seperated variables. For instance:

preg_match(REGEX, "{{word(a|b|c)|word$1}}", $result) // Applying REGEX on this
echo $result[1] // word(a|b|c)
echo $result[2] // word$1

How would you do ? It's like regex in regex, I'm completely lost...

Upvotes: 1

Views: 89

Answers (3)

Ry-
Ry-

Reputation: 225144

It really depends on how deep the nesting can be, but you can just split it by |, taking care not to split it by any | within parentheses. Here's the easy way, I suppose:

$str = 'word(a|b|c)|word$1'; // Trim off the leading and trailing {{ and }}
$items = explode('|', $str);
$realItems = array();

for($i = 0; $i < count($items); $i++) {
    $realItem = $items[$i];
    while(substr_count($realItem, '(') > substr_count($realItem, ')')) {
        // Glue them together and skip one!
        $realItem .= '|' . $items[++$i];
    }

    $realItems[] = $realItem;
}

Now $realItems[] contains your 2-4 key values, which you can simply pass into preg_replace; it'll do all the work for you.

Upvotes: 1

Qtax
Qtax

Reputation: 33928

You could match the parts using something like:

{{((?:(?!}}).)+)\|([^|]+?)}}

Note that if you are allowing arbitrary PCRE regex then some very complex and slow patterns can be constructed, possibly allowing simple DoS attacks on your site.

Upvotes: 2

fge
fge

Reputation: 121820

It is actually not that hard.

The thing is, the replacement string will only ever contain an escaped |, ie \|.

And for one of these very few occasions, .* will actually be useful here.

Do: preg_match("^{{(.*)\|([^|]+(?:\\\|[^|]*)*)}}$", $result);, this should do what you want.

The trick here is the second group: it is, again, the normal* (special normal*)* pattern, where normal is [^|] (anything but a pipe), and special is \\\| (a backslash followed by a pipe).

Upvotes: 0

Related Questions