Dan Lugg
Dan Lugg

Reputation: 20592

Help with Regex (PCRE)

I'm trying to work out a PCRE regex string for matching and capturing variable length strings, for use in my PHP application. I'd like (if plausible) to perform the actions in a single parse, but if it seems most computationally reasonable I can split it up into several runs divided by application logic.

A few example strings that could be contained within much larger blocks of text:

{{ var:myVar }}
{{ component:myComponent}}
{{ var:myVar modifier:function[arg1|arg2] }}
{{ region:myRegion modifier:function[arg1|arg2] modifier:function[arg1] }}

As you can see, there can be a considerable variance between the targeted strings. Summarized:

{{ type:name modifierType:modifierName[arg1|arg2|...] }}

Anyways, matching sets that are only {{ type:name }} is easy enough, but I can't figure out a way to effectively grab variable-length modifier lists. For the simple type/name pair, I'm using the following case-insensitive/free-spacing string:

'% {{ \s+ (?<type>var|component|region):(?<name>[a-z_][a-z0-9_]*) \s+ }} %ix'

I'm likely going to swap the type-list for a generic alphanumeric string capture for forward-compatibility, but this is working for now.

So, any suggestions on capturing both:

{{ component:myComponent }}
{{ var:myVar format:datetime[Y-m-d] container:h3[class=timestamp|id=main] }}

Upvotes: 0

Views: 167

Answers (1)

Amadan
Amadan

Reputation: 198324

You don't want to grab the whole thing in one regexp, since you probably want the modifiers as an array. A regexp will always have a constant number of captures, corresponding to the number of capturing parentheses.

Grab the main thing ({{ something }}), split the contents by spaces, then loop and match each individually, shoving the contents into an appropriate scalar or array (if you allow for more modifiers with the same prefix).

Upvotes: 2

Related Questions