Youss
Youss

Reputation: 4212

get "outertext" HTML with simple html dom

Im using simple html dom to scrape a news website. After a long search for fixing reative URls to absolute I finaly was able to get it right like this:

$url = 'http://www.nu.nl';

    $html = file_get_html($url);
    foreach($html->find('a') as $element) {
        echo url_to_absolute($url, $element->href), "<br />";
    }

The problem now is that this outputs the href as plain text. There are built in functions in simple html dom like "outertext", "innertext" and so on to get the html as plain html. How do I use this functions in the code above? How do I (for instance ) echo the complete page echo $html and include the code above to fix the URls?

Upvotes: 1

Views: 2293

Answers (1)

EaterOfCode
EaterOfCode

Reputation: 2222

Not tested but I think you can do something like

$url = 'http://www.nu.nl';

$html = file_get_html($url);
foreach($html->find('a') as $element) {
    $element->href = url_to_absolute($url, $element->href);
}
echo $html->save();

Since $element is a reference (I'm assuming) and $html->save() will recreate it from the DOM tree it will give the modified source

Upvotes: 1

Related Questions