Reputation: 217
I want to find all occurrences of parent::, the called function and the parameter
For example:
parent::test( new ReflectionClass($this) );
But the following regular expression doesn't match the outer brackets - only the inner ones:
parent::(.*)\((.*)\);
Array /* output */
(
[0] => parent::test( new ReflectionClass($this) );
[1] => test( new ReflectionClass
[2] => $this)
)
How do I have to modify the pattern?
That is for a PHP script, so I can use some other string functions, too.
Upvotes: 0
Views: 1075
Reputation: 4682
Using regexes to parse code is a REALLY bad idea. Take a look at PHP's Tokenizer, which you can use to parse PHP code into an array of tokens. You can than use that array to find the information you need.
You can also look at PHP-Token-Reflection's source code as an example of how to get meaningful information from those tokens.
Basically, you would need to find T_PARENT occurrences T_STRING occurrences with 'parent' as the string contents, followed by T_DOUBLE_COLON, followed with another T_STRING that contains the method name, than go forward and start counting the depth of the parentheses - whenever you get to an '(', increase the counter by one. Whenever you get to an ')', decrease the counter by one. Keep a record of everything you find in the process until the counter gets back to 0.
Something like that should work (not actually tested):
<?php
$tokens = tokens_get_all(...);
for ($i=0, $size = count($tokens); $i < $size; $i++( {
if ($tokens[$i][0] === T_STRING && $tokens[$i][1] === 'parent' && $tokens[++$i][0] === T_DOUBLE_COLON && $tokens[++$i][0] === T_STRING) {
$method = $tokens[$i][1];
$depth = 0;
$contents = array();
do {
$contents[] = $token = $tokens[++$i];
if ($token === '(') {
$depth++;
} elseif ($token === ')') {
$depth--;
}
} while ($depth > 0);
echo "Call to $method with contents:\n";
print_r(array_slice($contents, 1, -1)); // slices off the opening '(' and closing ')'
}
}
Upvotes: 2
Reputation: 7583
If you are only interested in the function and whatever is inside the round brackets,
and most parent:: calls are in a single line only. This may work for you.
parent::(.*?)\((.*)\);
The first capture should stop after the first encountered (
as this is not greedy.
The second capture will not stop until it captures the last );
on the same line.
Note: Do not use s
modifier as this will result in greedy matching up to the last );
in multiple lines of your code.
Upvotes: 1
Reputation: 10967
What you are trying to do is generally not possible with regular expressions. To do what you want, you have to be able to count things, which is something regular expressions can't do.
Making the matching greedy will eventually lead to matching too much, especially when you are supporting multiple line input.
To replace every occurence of parent:: you probably don't have to match the method call exactly, maybe it is enough to match something like this:
parent::(.*);
Then you can replace the parent:: with something else and use the first matching group to put whatever was in the document at this position.
Upvotes: 2
Reputation: 188014
Here is an example which is not really robust, but it would match the case in your question.
(parent::)([^\(]*)\(([^\(]*)\(([^()]*)\)
Here is a live regex test to experiment around: http://rubular.com/r/WwRsRTf7E6 (Note: rubular.com is targeted at ruby, but should be similar enough for php).
The matched elements would be in this case:
parent::
test
new ReflectionClass
$this
If you want something more robust, you might want to look into parsing tools (e.g. write a short grammer, that matches php function definitions) or static code analysis tools, as these often consist of AST generators etc. I have no personal experience with this one, but it sounds quite comprehensive:
pfff is a set of tools and APIs to perform some static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code. For now the effort is focused on PHP ...
Upvotes: 1