user1960364
user1960364

Reputation: 2001

Find/Replace array() with Regular Expression

I'm trying to search though my code replacing all old style PHP array()s with the shorthand [] style. However, I'm having some trouble creating a working/reliable regex...

What I currently have: (^|[\s])array\((['"](\s\S)['"]|[^)])*\) (View on Regex101)

// Match All
array('array()')

array('key' => 'value');
array(
    'key'  => 'value',
    'key2' => '(value2)'
);
    array()
  array()
array()

// Match Specific Parts
function (array $var = array()) {}
$this->in_array(array('something', 'something'));

// Don't match
toArray()
array_merge()
in_array();

I've created a Regex101 for it...

EDIT: This isn't the answer to the question, but one alternative is to use PHPStorm's Traditional syntax array literal detected inspection...

How to:

Upvotes: 1

Views: 1049

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

It is possible but not trivial since you need to fully describe two parts of the PHP syntax (that are strings and comments) to prevent parenthesis to be interpreted inside them. Here is a way to do it with PHP itself:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<quotes> (["']) (?: [^"'\\]+ | \\. | (?!\g{-1})["'] )*+ (?:\g{-1}|\z) )
    (?<heredoc> <<< (["']?) ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) \g{-2}\R
                (?>\N*\R)*?
                (?:\g{-1} ;? (?:\R | \z) | \N*\z)
    )
    (?<string> \g<quotes> | \g<heredoc> )

    (?<inlinecom> (?:// |\# ) \N* $ )
    (?<multicom> /\*+ (?:[^*]+|\*+(?!/))*+ (?:\*/|\z))
    (?<com> \g<multicom> | \g<inlinecom> )

    (?<nestedpar> \( (?: [^()"'<]+ | \g<com> | \g<string> | < | \g<nestedpar>)*+ \) )
)

(?:\g<com> | \g<string> ) (*SKIP)(*FAIL)
|
(?<![-$])\barray\s*\( ((?:[^"'()/\#]+|\g<com>|/|\g<string>|\g<nestedpar>)*+) \)
~xsm
EOD;

do {
    $code = preg_replace($pattern, '[${11}]', $code, -1, $count);
} while ($count);

The pattern contains two parts, the first is a definition part and the second is the main pattern.

The definition part is enclosed between (?(DEFINE)...) and contains named subpattern definitions for different useful elements (in particular "string" "com" and "nestedpar"). These subpatterns would be used later in the main pattern.

The idea is to never search a parenthese inside a comment, a string or among nested parentheses.

The first line: (?:\g<com> | \g<string> ) (*SKIP)(*FAIL) will skip all comments and strings until the next array declaration (or until the end of the string).

The last line describes the array declaration itself, details:

(?<![-$])\b        # check if "array" is not a part of a variable or function name
array \s*\(
(                   # capture group 11
    (?:             # describe the possible content
        [^"'()/\#]+ # all that is not a quote, a round bracket, a slash, a sharp
      |             # OR
        \g<com>     # a comment
      |
        /           # a slash that is not a part of a comment
      |
        \g<string>  # a string
      |
        \g<nestedpar> # nested round brackets
    )*+
)
\)

pattern demo

code demo

about nested array declarations:

The present pattern is only able to find the outermost array declaration when a block of nested array declarations is found.

The do...while loop is used to deal with nested array declarations, because it is not possible to perform a replacement of several nesting level in one pass (however, there is a way with preg_replace_callback but it isn't very handy). To stop the loop, the last parameter of preg_replace is used. This parameter contains the number of replacements performed in the target string.

Upvotes: 4

Related Questions