grimnebluna
grimnebluna

Reputation: 15

Replace unwanted characters inside opening HTML tag only

I need to do a little fix inside a script.

I need 2 specific characters ( and ») inside opening iframe tags to be changed into double quotes (").

For example:

<iframe src=»http://test.test″>»hellohello»</iframe>

needs to become:

<iframe src="http://test.test">»hellohello»</iframe>

My code so far:

$content = preg_replace("/\<[“]\>/","\"",$content); 
$content = preg_replace("/\<[»]\>/","\"",$content); 

But this is not working as desired.

Upvotes: 0

Views: 2524

Answers (4)

mickmackusa
mickmackusa

Reputation: 47972

To replace one or more of the rogue, multibyte characters inside the opening iframe tag (in an HTML-ignorant fashion), you can call strtr() or str_replace() inside of preg_replace_callback(). (Demo)

echo preg_replace_callback(
         '/<[^>]+>/',
         fn($m) => strtr($m[0], ['»' => '"', '“' => '"']),
         $tests
     );

Or

echo preg_replace_callback(
         '/<[^>]+>/',
         fn($m) => str_replace(['»', '“'], '"', $m[0]),
         $tests
     );

Because your HTML is "broken"/invalid, it s probably not worth trying to use a proper DOM parser to correct the markup.

Upvotes: 0

Daimos
Daimos

Reputation: 1473

You have wrong regex inside.

$content = preg_replace("/\<[“]\>/","\"",$content); 

Its mean that exactly:

<“> 

will be replaced with quote. Working example from other site:

$content = preg_replace('/<([^<>]+)>/e', '"<" .str_replace(""", \'"\', "$1").">"', $content); 

here str_replace is used and you can pass any quotes there. You should do same thing with preg_replace_callback, its recommended for newer PHP versions (from 5.5 /e flag is deprecated). Example (not sure its working, but you get the idea):

preg_replace_callback(
        '/<([^<>]+)>/',
        function ($matches) {
            return str_replace('OldQuote', 'NewQuote',$matches[0]);
        },
        $content
    );

Or with many different quotes create array:

preg_replace_callback(
        '/<([^<>]+)>/',
        function ($matches) {
            $quotes = array('OldQuote'=>'NewQuote','OldQuote2'=>'NewQuote2');
            return str_replace(array_keys($quotes), array_values($quotes),$matches[0]);
        },
        $content
    );

Upvotes: 3

Ashish Choudhary
Ashish Choudhary

Reputation: 2034

One solution is not to use preg_replace. You can simply use str_replace if the format is as you have described in question.

$str = '<iframe src=»http://test.test″>»hellohello»</iframe>';
$repl = str_replace(array('=»', '″>', '″/>'), array('"', '">'), $str);
print_r($repl);

Upvotes: -1

nessuno
nessuno

Reputation: 27050

This should do the trick

$content = preg_replace('/<(.+?)(?:»|“|″)(.+?)>/','<\1"\2>', $content);

One single regexp, matching for anything containing » or or between < and >. Replaced with with \1 (first capturing group). Followed by " and \2 (2nd capturing group).

I hope it helps

Upvotes: -1

Related Questions