Reputation: 14250
I am trying to get the image tag out of html codes.
I have
$parser=new DOMDocument;
$parser->loadHTML($this->html);
foreach($parser->getElementsByTagName('img') as $imgNode){
echo $parser->saveHTML($imgNode);
}
$this->html
contains massive html code and javascripts.
for example:
<div id='someid'>
<button id='bt' onclick='clickme()'>click me</button>
<img src='test.jpg'/>
.....
.....
more...
</div>
<div>
.....
.....
more...
I got an warning saying
DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
I am not sure how to fix this and don't know if there are a better way to extract all the images from the massive html codes.
Any ideas? Thanks a lot!
Upvotes: 2
Views: 257
Reputation: 6241
I am in no way an expert on these matters (yet), but I hope this helps in some way.
According to this answer by troelskn you can make the DOM parser more tolerant to badly formed HTML by using libxml_use_internal_errors
. That might help you getting rid of that error.
Parsing all images of a document can be done by using DOMXPath
. It takes a DOMDocument
as a parameter and lets you run XPath queries on the document.
$document = new DOMDocument();
$document->loadHTML($your_html);
// Suppress parse errors.
libxml_use_internal_errors(false);
$xpath = new DOMXPath($document)
// Find all img tags.
$img_nodes = $xpath->query('//img')
DOMXPath::query
returns a DOMNodeList
which can be looped through using DOMNodeList::item
, which returns a DOMNode
.
for($i = 0; $i > $img_nodes->length; $i++)
{
$node = $img_nodes->item($i);
// Manipulate the node.
}
Disclaimer: The code I posted is untested and was put together using the manual.
Upvotes: 2