ehime
ehime

Reputation: 8375

PHP Regex missing two matches

I almost have my regular expression down for skimming html pages, but have ran into two issues that I am trying to get squished before I an proceed, I need to be able to match both empty and slash (and empty closing quote) but have exhausted my ability to see what I'm doing, could someone help me with the final bit?

$pathspec='in-front';

$subjects = array(
    '<base href="http://foo.com/images/" target="_blank">', # no changes              (correct)
    '<base href="/" target="_blank">',                      # '/in-front/'            (fails)
    '<a href="https://foo.com/images/">Foo</a>',            # no changes              (correct)
    '<a href="">Foo</a>',                                   # '/in-front/'            (fails)
    '<img src="bar/foo.png" />',                            # no changes              (correct)
    '<img src="/bar/foo.png" />',                           # '/in-front/bar/foo.png' (correct)
);


foreach ($subjects AS $subject)

    echo preg_replace( '/(href|src)=["\']?\/(?!\/)([^"\'>]+)["\']?/', "$1='/$pathspec/$2'", $subject ) . "\n";

die;

Expected output is in the comments portion, Thank you.

Upvotes: 2

Views: 104

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this pattern:

$pattern = '~\b(?:href|src)\s*=\s*(["\']?+)\K(?:/|(?=[\s>]|\1))~i';
$replacement = "/$pathspec/";

$result = preg_replace($pattern, $replacement, $subject);

Upvotes: 1

php_nub_qq
php_nub_qq

Reputation: 16045

See if this works for you

preg_replace('#(href|src)=["\'](?:/|/(?!\/)(\S+?)|)["\']#',"$1='/$pathspec/$2'",$subject)

Upvotes: 2

Related Questions