Reputation: 1
What I want is simple Get webpage HTML and scrape all outbound links
what I have so far is
<?php
function get_content($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = get_content('http://example.com');
?>
Upvotes: 0
Views: 734
Reputation: 68556
Make use of DOMDocument
$dom = new DOMDocument;
$dom->loadHTML($html); // <----------- Pass the HTML content you retrieved from get_content()
foreach ($dom->getElementsByTagName('a') as $tag) {
echo $tag->getAttribute('href');
}
Upvotes: 1