Reputation: 2001
I'm trying to search though my code replacing all old style PHP array()
s with the shorthand []
style. However, I'm having some trouble creating a working/reliable regex...
What I currently have: (^|[\s])array\((['"](\s\S)['"]|[^)])*\)
(View on Regex101)
// Match All
array('array()')
array('key' => 'value');
array(
'key' => 'value',
'key2' => '(value2)'
);
array()
array()
array()
// Match Specific Parts
function (array $var = array()) {}
$this->in_array(array('something', 'something'));
// Don't match
toArray()
array_merge()
in_array();
I've created a Regex101 for it...
EDIT: This isn't the answer to the question, but one alternative is to use PHPStorm's Traditional syntax array literal detected
inspection...
How to:
Code
menuRun inspection by name...
(Ctrl + Alt + Shift + I)Traditional syntax array literal detected
<Enter>
<Enter>
Inspection
window.Upvotes: 1
Views: 1049
Reputation: 89547
It is possible but not trivial since you need to fully describe two parts of the PHP syntax (that are strings and comments) to prevent parenthesis to be interpreted inside them. Here is a way to do it with PHP itself:
$pattern = <<<'EOD'
~
(?(DEFINE)
(?<quotes> (["']) (?: [^"'\\]+ | \\. | (?!\g{-1})["'] )*+ (?:\g{-1}|\z) )
(?<heredoc> <<< (["']?) ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) \g{-2}\R
(?>\N*\R)*?
(?:\g{-1} ;? (?:\R | \z) | \N*\z)
)
(?<string> \g<quotes> | \g<heredoc> )
(?<inlinecom> (?:// |\# ) \N* $ )
(?<multicom> /\*+ (?:[^*]+|\*+(?!/))*+ (?:\*/|\z))
(?<com> \g<multicom> | \g<inlinecom> )
(?<nestedpar> \( (?: [^()"'<]+ | \g<com> | \g<string> | < | \g<nestedpar>)*+ \) )
)
(?:\g<com> | \g<string> ) (*SKIP)(*FAIL)
|
(?<![-$])\barray\s*\( ((?:[^"'()/\#]+|\g<com>|/|\g<string>|\g<nestedpar>)*+) \)
~xsm
EOD;
do {
$code = preg_replace($pattern, '[${11}]', $code, -1, $count);
} while ($count);
The pattern contains two parts, the first is a definition part and the second is the main pattern.
The definition part is enclosed between (?(DEFINE)...)
and contains named subpattern definitions for different useful elements (in particular "string" "com" and "nestedpar"). These subpatterns would be used later in the main pattern.
The idea is to never search a parenthese inside a comment, a string or among nested parentheses.
The first line: (?:\g<com> | \g<string> ) (*SKIP)(*FAIL)
will skip all comments and strings until the next array declaration (or until the end of the string).
The last line describes the array declaration itself, details:
(?<![-$])\b # check if "array" is not a part of a variable or function name
array \s*\(
( # capture group 11
(?: # describe the possible content
[^"'()/\#]+ # all that is not a quote, a round bracket, a slash, a sharp
| # OR
\g<com> # a comment
|
/ # a slash that is not a part of a comment
|
\g<string> # a string
|
\g<nestedpar> # nested round brackets
)*+
)
\)
about nested array declarations:
The present pattern is only able to find the outermost array declaration when a block of nested array declarations is found.
The do...while
loop is used to deal with nested array declarations, because it is not possible to perform a replacement of several nesting level in one pass (however, there is a way with preg_replace_callback
but it isn't very handy). To stop the loop, the last parameter of preg_replace
is used. This parameter contains the number of replacements performed in the target string.
Upvotes: 4