Tim Strijdhorst
Tim Strijdhorst

Reputation: 1569

A PHP regex to extract php functions from code files

I'm trying to make a PHP regex to extract functions from php source code. Until now i used a recursive regex to extract everything between {} but then it also matches stuff like if statements. When i use something like:

preg_match_all("/(function .*\(.*\))({([^{}]+|(?R))*})/", $data, $matches);

It doesn't work when there is more than 1 function in the file (probably because it uses the 'function' part in the recursiveness too).

Is there any way to do this?

Example file:

<?php
if($useless)
{
  echo "i don't want this";
}

function bla($wut)
{
  echo "i do want this";
}
?>

Thanks

Upvotes: 5

Views: 4071

Answers (3)

Daweb
Daweb

Reputation: 84

Regex accepting recursive curly brackets in body

I know there is a selected answer, but in case tokenizer can not be used this is a simple regex to extract function (name, param and body) from php code.

Main difference with Ioseb answer above is that this regex accepts cases with recursive curly brackets in the body, means that it won't stop after the first curly brackets closing.

/function\s+(?<name>\w+)\s*\((?<param>[^\)]*)\)\s*(?<body>\{(?:[^{}]+|(?&body))*\})/

Explanation

/                                   # delimiter
function                            # function keyword
\s+                                 # at least one whitespace
(?<name>\w+)                        # function name (a word) => group "name"
\s*                                 # optional whitespace(s)
\((?<param>[^\)]*)\)                # function parameters => group "param"
\s*                                 # optional whitespace(s)
(?<body>\{(?:[^{}]+|(?&body))*\})   # body function (recursive curly brackets allowed)  => group "body"
/                                   # delimiter

Example

$data = '
    <?php 
    function my_function($param){
        if($param === true){
            // This is true
        }else if($param === false){
            // This is false
        }else{
            // This is not
        }
    }
    ?>
';

preg_match_all("/function\s+(?<name>\w+)\s*\((?<param>[^\)]*)\)\s*(?<body>\{(?:[^{}]+|(?&body))*\})/", $data, $matches);
print_r($matches['body']);

/*
Array
(
    [0] => {
        if($param === true){
            // This is true
        }else if($param === false){
            // This is false
        }else{
            // This is not
        }
    }
)
*/

Limitation

Curly brackets have to be balanced. ie, this body will be partially extracted :

function my_function(){
    echo "A curly bracket : }";
    echo "Another curly bracket : {";
}

/*
Array
(
    [0] => {
    echo "A curly bracket : }
)
*/

Upvotes: 0

ioseb
ioseb

Reputation: 16951

Moved here from duplicate question: PHP, Regex and new lines

Regex solution:

$regex = '~
  function                 #function keyword
  \s+                      #any number of whitespaces 
  (?P<function_name>.*?)   #function name itself
  \s*                      #optional white spaces
  (?P<parameters>\(.*?\))  #function parameters
  \s*                      #optional white spaces
  (?P<body>\{.*?\})        #body of a function
~six';

if (preg_match_all($regex, $input, $matches)) {
  print_r($matches);
}

P.S. As was suggested above tokenizer is preferable way to go.

Upvotes: 5

user187291
user187291

Reputation: 53940

regexps is the wrong way to do it. Consider tokenizer or reflection

Upvotes: 6

Related Questions