Christian Beikov
Christian Beikov

Reputation: 16410

Parsing nested structures in PHP with preg_match

Hello I want to make something like a meta language which gets parsed and cached to be more performant. So I need to be able to parse the meta code into objects or arrays.

Startidentifier: {

Endidentifier: }

You can navigate through objects with a dot(.) but you can also do arithmetic/logic/relational operations.

Here is an example of what the meta language looks like:

or nested

or with operations

or more logical

I think most of you already understood what i want. At the moment I can do only simple operations(without nesting and with only 2 values) but nesting for getting values with dynamic property names works fine. also the text concatination works fine

e.g. "Hello {myObj.name}! How are you {myObj.type}?".

Also the possibility to make short if like (condition) ? (true-case) : (false-case) would be nice but I have no idea how to parse all that stuff. I am working with loops with some regex at the moment but it would be probably faster and even more maintainable if I had more in regex.

So could anyone give me some hints or want to help me? Maybe visit the project site to understand what I need that for: http://sourceforge.net/projects/blazeframework/

Thanks in advance!

Upvotes: 0

Views: 1246

Answers (2)

Daniel Vandersluis
Daniel Vandersluis

Reputation: 94163

It is non-trivial to parse a indeterminate number of matching braces using regular expressions, because in general, either you will match too much or too little.

For instance, consider Hello {myObj.name}! {mySelf.{myObj.{keys["ObjProps"][0]}.personAttribute.first}.size}? to use two examples from your input in the same string:

If you use the first regular expression that probably comes to mind \{.*\} to match braces, you will get one match: {myObj.name}! {mySelf.{myObj.{keys["ObjProps"][0]}.personAttribute.first}.size} This is because by default, regular expressions are greedy and will match as much as possible.

From there, we can try to use a non-greedy pattern \{.*?\}, which will match as little as possible between the opening and closing brace. Using the same string, this pattern will result in two matches: {myObj.name} and {mySelf.{myObj.{keys["ObjProps"][0]}. Obviously the second is not a full expression, but a non-greedy pattern will match as little as possible, and that is the smallest match that satisfies the pattern.

PCRE does allow recursive regular expressions, but you're going to end up with a very complex pattern if you go down that route.

The best solution, in my opinion, would be to construct a tokenizer (which could be powered by regex) to turn your text into an array of tokens which can then be parsed.

Upvotes: 1

Andreas Linden
Andreas Linden

Reputation: 12721

maybe have a look at the PREG_OFFSET_CAPTURE flag!?

Upvotes: 0

Related Questions