Reputation: 458
I need some sort of regular expression to grab some part of the code inside curly braces. There are other questions about this but mine is a little bit different.
Consider this code as sample;
public function my_method($my_input) {
if(true == false) { $me = "Forever alone. :("; }
if(true == true) { $me = "No longer alone. :}"; }
if(false == false) { $me = ":{ - This is so Wrong."; }
}
and ignore the "public function my_method($my_input)" part. How can I grab
if(true == false) { $me = "Forever alone. :("; }
if(true == true) { $me = "No longer alone. :}"; }
if(false == false) { $me = ":{ - This is so Wrong."; }
without getting misleaded by "{" and "}" characters inside strings (and comments etc. ofc)?
My knowledge about regular expressions are very limited and I'm having hard times to achieve this. :/
Upvotes: 1
Views: 269
Reputation: 46280
I made a regex that will pass in most of the cases, even if quotes are backslashed. Here is an example script. I provided comments in the regex, though note that I needed to backslash every ' in the regex since i use it as the string delimiters for the regex itself.
The regex is recursive so it has no limit on how many levels deep the brackets are nested. However, there can't be an error in the brackets (i.e. no matching brackets), but that's logical i guess.
$str =
'
public function my_method($my_input) {
if(true == false) { $me = "Forever alone. :("; }
if(true == true) { $me = "No longer alone. :}"; }
if(true == true) { $me = \'No longer alone. :}\'; }
if(true == true) { $me = \'No longer \\\' alone. :}\'; }
if(false == false) { $me = ":{ - This is so Wrong."; }
}
public function my_method($my_input) {
if(true == false) { $me = "Forever happy. :("; }
if(true == true) { $me = "No longer happy. :}"; }
if(true == true) { $me = \'No longer happy. :}\'; }
if(true == true) { $me = \'No longer \\\' happy. :}\'; }
if(false == false) { $me = ":{ - This is so Wrong."; }
}
';
preg_match_all(
'/
{ # opening {
( # matching parentheses
(?: # non matching parentheses
(?: # non matching parentheses
[^{}"\']+ # anything but { } " and \'
| # or
" # opening "
(?: # non matching parentheses
[^"\\\]* # anything but " and \
| # or
\\\" # a \ followed by a "
)* # as often as possible
" # closing "
| # or
\' # opening \'
(?: # non matching parentheses
[^\'\\\\]* # anything but \' and \
| # or
\\\\\' # a \ followed by a \'
)* # as often as possible
\' # closing \'
)* # as often as possible
| # or
(?R) # repeat the whole pattern
)* # as often as possible
) # close matching parentheses
} # closing }
/xs',
$str,
$matches
);
print_r($matches);
Upvotes: 3
Reputation: 318488
Regexps are not the right tool for this - see @phimuemue's answer for more details..
You can use PHP's own tokenizer in your script though. However, it will not simply give you "what's inside some block" but rather the tokens inside the block. Depending on what you want to do you need to reconstruct the sourcecode from the tokens.
Upvotes: 2
Reputation: 35983
Matching parentheses is one of the prototypical examples you should not try with regular expressions (its too complicated for regexps even without parentheses within strings or so).
That is because (formal) languages with nested parentheses are not regular, but represented by context-free grammars, a considerably more complicated thing than a simple regex. On a very high level regular expressions "can not count up to arbitrary large numbers", i.e. they can not recognize which closing parenthesis belongs to which opening parenthesis (as long as you allow arbitrarily nesting depth of parentheses - like PHP (at least in principle)).
You should better grab some tool supporting context-free grammars or even get some PHP parser that is already written.
In order to extract functions yourself, you should probably just look for the keyword function
(or other keywords indicating a function block), and go to the opening parenthesis ({
). Then, you could go on character by character until you find the matching closing parenthesis (}
), while keeping track whether you're currently within a string or a comment or something.
However, I don't wish you to do this task by hand yourself, since I can imagine it can be quite cumbersome to take care of all the possible corner cases...
Upvotes: 4