Reputation: 345
I run the code first on MAMP and it worked very well. But when I tried to run the code on another server, I got a lot of warnings like:
Warning: DOMDocument::loadHTML(): Unexpected end tag : head in Entity, line: 3349 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced tag in Entity, line: 3350 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): Tag header invalid in Entity, line: 3517 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17
The codes are following:
<?php
$amazon = file_get_contents('http://www.amazon.com/blablabla');
$doc = new DOMdocument();
$doc->loadHTML($amazon);
$doc->saveHTML();
$price = $doc -> getElementById('actualPriceValue')->textContent;
$ASIN = $doc -> getElementById('ASIN')->getAttribute('value');
?>
Anyone knows what's going on? Thanks!
Upvotes: 34
Views: 45771
Reputation: 197933
To disable the warning, you can use
libxml_use_internal_errors(true);
This works for me, Manual, read on:
Background: You are loading invalid HTML. Invalid HTML is quite common, DOMDocument::loadHTML
corrects most of the problems, but gives warnings by default.
With libxml_use_internal_errors
you can control that behavior. Set it before loading the document:
$previously = libxml_use_internal_errors(true);
$doc->loadHTML($amazon);
Then after loading you can deal with the errors (if you want/need to):
/* @var LibXMLError[] $xmlErrors */
$xmlErrors = libxml_get_errors();
And finally clear them (as they will add up) and restore the previous setting if applicable:
unset($xmlErrors);
libxml_clear_errors();
libxml_use_internal_errors($previously);
References
libxml_use_internal_errors
Disable libxml errors and allow user to fetch error information as neededlibxml_clear_errors
Clear libxml error bufferlibxml_get_errors
Retrieve array of errorsLibXMLError
The libXMLError classUpvotes: 137
Reputation: 19170
You can surpress the warning like this:
@$doc->loadHTML($amazon);
Upvotes: 6
Reputation: 2405
This problem is related to non xHTML code
As DOMdocument() can only process clean XHTML you need to clean up your code
Php have an extension that does the job pretty well. Called Tidy php.net/book.tidy
It might be tricky as you may need to enable it in your php.ini
Then
$tidy_config = array(
'clean' => true,
'output-xhtml' => true,
'show-body-only' => true,
'wrap' => 0,
);
$tidy = tidy_parse_string( $html, $tidy_config, 'UTF8');
$tidy->cleanRepair();
$doc = new DOMdocument();
$doc->loadHTML( (string) $tidy);
Upvotes: 6