Reputation: 2485
I am trying to grab content from another one of my site which is working fine, apart from all the links are incorrect.
include_once('../simple_html_dom.php');
$page = file_get_html('http://www.website.com');
$ret = $page->find('div[id=header]');
echo $ret[0];
Is there anyway instead of all links showing link to have the full link? using preg replace.
$ret[0] = preg_replace('@(http://([\w-.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)@',
'<a href="$1">http://fullwebsitellink.com$1</a>', $ret[0]);
I guess it would be something like above but I dont understand?
Thanks
Upvotes: 0
Views: 884
Reputation: 97994
Your question doesn't really explain what is "incorrect" about the links, but I'm guessing you have something like this:
<div id="header"><a href="/">Home</a> | <a href="/sitemap">Sitemap</a></div>
and you want to embed it in another site, where those links need to be fully-qualified with a domain name, like this:
<div id="header"><a href="http://example.com/">Home</a> | <a href="http://example.com/sitemap">Sitemap</a></div>
Assuming this is the case, the replacement you want is so simple you don't even need a regex: find all href attributes beginning "/", and add the domain part (I'll use "http://example.com") to their beginning to make them absolute:
$scraped_html = str_replace('href="/', 'href="http://example.com/', $scraped_html);
Upvotes: 3