Reputation: 718
I'm trying to write a simple PHP template parser for learning purposes and I'm trying to implement if conditions logic, my parser will be very limited but that's ok I'm just interested in achieving what I decided to do.
Here's the code:
$pat = '/{if (\b[A-Za-z0-9_]+\b)}(.*?){\\/if}/s';
$message = '{if another}Another{/if} {if outer}Outer {if inner}Inner {if innermost}Innermost{/if}{/if}{/if}';
$vars = ['another' => false, 'outer' => true, 'inner' => true, 'innermost' => true];
while (preg_match_all($pat, $message, $m)) {
foreach ($m[1] as $n => $key) {
$all = $m[0][$n];
$text = $m[2][$n];
if (!isset($vars[$key]) || empty($vars[$key])) {
$message = str_replace($all, '', $message);
} else {
$message = str_replace($all, $text, $message);
}
}
}
echo $message;
Parser requirements for if conditions:
Unfortunately my understanding about regular expressions is very limited, I only managed to construct a simple regexp (probably less than acceptable), so basically I need to support if conditions which can be written in this format.
{if something}Something{/if}
As you can see from the code, I provided the example where message contains two not related variables and there's a variable 'outer' which contains other two if conditions nested in each other.
When all variables contain truthy values it seems everything works as expected, but if you set variable to falsy value and if that variable is nested inside then text at $message variable isn't being parsed properly. I get additional unnecessary {/if} closing tag hanging out.
When I inspected the place where it checks if variable value is empty I noticed that it didn't give me the right portion of if condition block so I can find and replace if statement where condition isn't true.
I'm thinking that my regexp is flawed, but I'm not sure if it's possible to do it with regular expressions of what I'm asking, should I try another new approach, or there's only one small fix that needs to be done?
At least I'd like to know the right algorithm how to solve this problem given those requirements above.
Thank you in advance for all the information you can provide me.
Upvotes: 1
Views: 821
Reputation:
You "can't" do this with a regex, though most engines have powerful extensions, and some may allow you to do this. See this question for details on matching nested structures. See also this answer on the limitations of regex.
The "right" way to do this is with a parser. An introduction to parsing is far too large for a StackOverflow answer. I recommend reading Engineering: A Compiler, or for something more lightweight (and free) Let's Build a Compiler or Crafting Interpreters.
The basic approach is to find the grammar for the template (or failing that, reconstruct it yourself) and identify the lexical elements, or tokens. That is, you can use a regex to match {if var_name}
, {/if}
, and normal text, then operate on those elements. The problem becomes much easier once you have that separation.
Upvotes: 1