Reputation: 47
I have some PHP code which grabs a website's HTML code, then echos it to the screen. I'm looking for a way to scan the HTML, and then replace all href values with another value. For example, I have "http://somepage.com" which contains the HTML code <a href="http://somepage.com/somepage">Click me</a>
, however the value of the "href" part could change at any time. I want to echo the same HTML code, but replace the href
value with http://mywebsite.com/somepage
. How can I do that? I have this so far
$q = htmlentities($_GET['q']);
$html2 = "https://somewebsite.com/search/" . str_replace(' ', '%20', $q);
$html = file_get_contents($html2);
echo $html;
Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity
Upvotes: 1
Views: 4806
Reputation: 5708
You can use preg_replace() to replace the searched term in the string like this:
<?php
// example page contents
$pageContents = '<a href="http://somepage.com/somepage">Click me</a>Some example text.
<div>Example div <a href="http://anotherDomain.com/somepage2">Another link</a>.</div>';
// ------ the Search pattern explanation -------
// (http:\/\/)? means that the http:// may or may not exist
// ([\w]+) the parentheses () will remember the expression inside
// the \s? means there may or may not be a space character there
// ------ the Replace pattern explanation -------
// replace the matched expression with the provided replacement
// the $2 is the second parenthesized expression () from the search pattern
$html = preg_replace('/<a href="(http:\/\/)?[\w.]+\/([\w]+)"\s?>/', '<a href="http://mywebsite.com/$2">' ,$pageContents);
echo $html;
?>
which outputs:
Click meSome example text.
Example div Another link.Upvotes: 1