Reputation: 1592
I have a problem where I have no idea how to solve this and if regular expression are the best way. My idea is to get the name,variables,content of functions in a file. This is my regular expression:
preg_match_all('/function (.*?)\((.*?)\)(.*?)\{(.*?)\}/s',$content,$funcs,PREG_SET_ORDER);
And I have this testfile:
function testfunc($text)
{
if ($text)
{
return 1;
}
return 0;
}
Of course I will get everything until "}" before return 0; Is there a way to get everything in the function so find the right "}".
Upvotes: 1
Views: 204
Reputation: 784918
Contrary to many beliefs PHP (PCRE) has something called Recursive Pattern Regex that lets you find matching nested brackets. Consider this code:
$str = <<<'EOF'
function testfunc($text) {
if ($text) {
return 1;
}
return 0;
}
EOF;
if ( preg_match('/ \{ ( (?: [^{}]* | (?0) )+ ) \} /x', $str, $m) )
echo $m[0];
{
if ($text) {
return 1;
}
return 0;
}
$str = <<<'EOF'
function testfunc($text) {
if ($text) {
return 1;
}
return 0;
}
EOF;
if ( preg_match('/ (function [^{]+ ) ( \{ (?: [^{}]* | (?-1) )* \} ) /x', $str, $m) )
print_r ($m);
Array
(
[0] => function testfunc($text) {
if ($text) {
return 1;
}
return 0;
}
[1] => function testfunc($text)
[2] => {
if ($text) {
return 1;
}
return 0;
}
)
Upvotes: 3
Reputation: 20889
Is there a way to get everything in the function so find the right "}".
Short Answer: no.
Long Answer:
This can not be handled with a single Expression. {
and }
can also appear inside a method body, making it hard to find the correct ending }
. You would need to process (iterative or recursive) ALL pairs of {}
and manually sort out ALL Pairs, that have a "method name" in front of it.
This, however isn't simple either, because you need to exclude all the Statements, that look like a function but are valid inside the method body.
I don't think, that Regex is the way to go for such a task. EVEN if you would manage to create all the required Regex-Pattern - Performance would be worse compared to any dedicated parser.
Upvotes: 0
Reputation: 8641
Regular expressions are not the best tool for that job. Parsers are.
No doubt you can use regexp callbacks to eventually manage what you intend, but this would be ungodly obfuscated and fragile.
A parser can easily do the same job. Better still, if you are planning on parsing PHP with PHP, you can use the Zend parser that does the job for you.
Upvotes: 1
Reputation: 476557
Not in general, (you can of course define a regex for two levels deep parsing that would be something like function (.*)\((.*)\)(.*)\{([^}]*(\{[^}]*\})*)\}
but since you can nest such structures arbitrarily deep, you will eventually run out of regex :D ). One needs a context free grammar to do this.
You can generate such grammar parsers for instance with Yacc, Bison, Gppg,...
Furthermore you don't need to state .*?
, .*
means zero or more times, .+
means one time or more.
Upvotes: 0