Reputation: 4212
Im using simple html dom to scrape a news website. After a long search for fixing reative URls to absolute I finaly was able to get it right like this:
$url = 'http://www.nu.nl';
$html = file_get_html($url);
foreach($html->find('a') as $element) {
echo url_to_absolute($url, $element->href), "<br />";
}
The problem now is that this outputs the href as plain text. There are built in functions in simple html dom like "outertext", "innertext" and so on to get the html as plain html. How do I use this functions in the code above? How do I (for instance ) echo the complete page echo $html
and include the code above to fix the URls?
Upvotes: 1
Views: 2293
Reputation: 2222
Not tested but I think you can do something like
$url = 'http://www.nu.nl';
$html = file_get_html($url);
foreach($html->find('a') as $element) {
$element->href = url_to_absolute($url, $element->href);
}
echo $html->save();
Since $element
is a reference (I'm assuming) and $html->save()
will recreate it from the DOM tree it will give the modified source
Upvotes: 1