Vivendi
Vivendi

Reputation: 21007

Advanced regex replacement. Check if in <script> tag

I have a simple regex which checks an entire string for a function declaration. So in this code:

public function Test($name)
{
    echo 'In test';
}

It will find the first part:

function Test($name)
{

And it replaces that with a custom piece:

function Test($name)
{
    echo 'New piece';

Which eventually makes my code look like this:

public function Test($name)
{
    echo 'New piece';
    echo 'In test';
}

This all works perfectly fine with this regex:

preg_match_all ( '/function(.*?)\{/s', $source, $matches )

The problem is, is that i want to ignore everything when the regex sees a script tag. So in this case, this source:

public function Test($name) //<--- Match found!
{
    echo 'In test';
}

<script type="text/javascript"> //<--- Script tag found, dont do any matches!
$(function() {
    function Test()
    {
        var bla = "In js";
    }
});
</script> //<--- Closed tag, start searching for matches again.

public function Test($name) //<--- Match found!
{
    echo 'In test';
}

How can i do this in my regex?

Upvotes: 2

Views: 944

Answers (3)

Edson Medina
Edson Medina

Reputation: 10269

No amount of regex is going to achieve a decent fail-proof solution.

The right way to do this is with php tokenizer.

<?php

$code = <<<END
<?php 
public function Test(\$name) //<--- Match found!
{
    echo 'In test';
}
?>
<script type="text/javascript"> //<--- Script tag found, dont do any matches!
$(function() {
    function Test()
    {
        var bla = "In js";
    }
});
</script> //<--- Closed tag, start searching for matches again.
<? 
public function Bla(\$name) //<--- Match found!
{
    echo 'In test';
}
END;


function injectCodeAtFunctionsStart ($originalCode, $code)
{
    $tokens = token_get_all ($originalCode);

    $newTokenTree = '';

    // iterate tokens
    for ($i = 0, $total = count($tokens); $i < $total; $i++) 
    {
        $node = $tokens[$i];
        $newTokenTree[] = $node;

        if (is_array ($node)) 
        {
            // function start
            if ($node[0] == T_FUNCTION) 
            {
                // walk to first brace
                while ($tokens[$i] !== '{') {
                    $newTokenTree[] = $tokens[$i];
                    $i++;
                }
                $i++;

                // keep space
                $space = $tokens[$i];
                $newTokenTree[] = $space;

                // add new piece
                $newTokenTree[] = $code;
                $newTokenTree[] = $space;
            }
        }
    }

    // rebuild code from tokens    
    $content = '';
    foreach ($newTokenTree as $node) {
        $content .= is_scalar ($node) ? $node : $node[1];
    }

    return $content;
}


echo injectCodeAtFunctionsStart ($code, 'echo "new piece";');

Upvotes: 0

morja
morja

Reputation: 8550

As mentioned in the comments:

If your php functions always have a visibility modifier like public you could do:

(?:public|protected|private)\s+function\s+\w+\(.*?\)\s*\{

Otherwise, you could strip the script part first. Something like:

$text = preg_replace('/<script(?:(?!<\/script>).)*<\/script>/s','',$text);

Upvotes: 1

ilomambo
ilomambo

Reputation: 8350

I don't know python, but I know regex:

Your original regex is not so good, since it matches

// This is a functional comment { isn't it? }
             ^^^^^^^^...........^

Maybe if you make it more robust it will solve your problem:

^\s*(public|protected|private)\s+function\s+\(.*?\).*?{

This will ensure it is a function declaration for 99% of the cases. There are still some unusual cases where you can fool it.

Upvotes: 1

Related Questions