Reputation: 89
How can I solve the problem of below code? This code gets all links in a website but it doesn't work on some website like the below one. How can I solve this problem?
<?php
$html = file_get_contents('http://blogfa.com/members/updated.aspx');
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url . '<br />';
}
?>
Upvotes: 0
Views: 468
Reputation: 394
Actually You are gettting links..But there is a warning ..To Solve this U have to add one line .. I am getting this warning
E_WARNING : type 2 -- DOMDocument::loadHTML(): htmlParseStartTag: misplaced <body> tag in Entity, line: 20 -- at line 6
Solution :
<?php
$html = file_get_contents('http://blogfa.com/members/updated.aspx');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url . '<br />';
}
?>
libxml_use_internal_errors(true); is used for disable the warning..
Upvotes: 1
Reputation: 2003
When I run your code I get the following PHP error:
E_WARNING : type 2 -- DOMDocument::loadHTML(): htmlParseStartTag: misplaced <body> tag in Entity, line: 20 -- at line 6
If you look at the sourcecode of your page at http://blogfa.com/members/updated.aspx, you'll see that the <body>
-tag is opened twice.
Try removing the second <body>
-tag. Other than this, your code seems to work.
Upvotes: 0