Reputation: 20592
I'm trying to work out a PCRE regex string for matching and capturing variable length strings, for use in my PHP application. I'd like (if plausible) to perform the actions in a single parse, but if it seems most computationally reasonable I can split it up into several runs divided by application logic.
A few example strings that could be contained within much larger blocks of text:
{{ var:myVar }}
{{ component:myComponent}}
{{ var:myVar modifier:function[arg1|arg2] }}
{{ region:myRegion modifier:function[arg1|arg2] modifier:function[arg1] }}
As you can see, there can be a considerable variance between the targeted strings. Summarized:
{{ type:name modifierType:modifierName[arg1|arg2|...] }}
{{
and }}
.type
or name
is expressed as [a-z_][a-z0-9_]*
; There will be only one type:name
pair, and it will appear first.modifierType
or modifierName
is expressed as [a-z_][a-z0-9_]
; The modifierName
is followed by an argument list of one-or-more arguments in square brackets, [
and ]
. The argument list is delimited by pipes |
. There can be zero-or-more modifierType:modifierName[argumentList]
sets.\s+
Anyways, matching sets that are only {{ type:name }}
is easy enough, but I can't figure out a way to effectively grab variable-length modifier lists. For the simple type/name pair, I'm using the following case-insensitive/free-spacing string:
'% {{ \s+ (?<type>var|component|region):(?<name>[a-z_][a-z0-9_]*) \s+ }} %ix'
I'm likely going to swap the type-list for a generic alphanumeric string capture for forward-compatibility, but this is working for now.
So, any suggestions on capturing both:
{{ component:myComponent }}
{{ var:myVar format:datetime[Y-m-d] container:h3[class=timestamp|id=main] }}
Upvotes: 0
Views: 167
Reputation: 198324
You don't want to grab the whole thing in one regexp, since you probably want the modifiers as an array. A regexp will always have a constant number of captures, corresponding to the number of capturing parentheses.
Grab the main thing ({{ something }}
), split the contents by spaces, then loop and match each individually, shoving the contents into an appropriate scalar or array (if you allow for more modifiers with the same prefix).
Upvotes: 2