Reputation: 395
I have a page that looks something like this:
...
<div class="container">
<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->
</div> <!-- /container -->
...
The structure of each of the info class divs is the same, and I'd like to be able to loop through the document and for each div with a class of info, parse the various component into either an array or individual variables for the purposes of outputting the data in some sort of human-readable format, like a csv file or HTML table.
I've tried using the DOMDocument method, and using getElementByTagName to extract the contents of each tag, but because the div contains multiple tag types (h3, a, span), I haven't figure out how to accomplish what I'm looking to do.
In the end, I want to be able to put the data in a format like this:
divclass, h3, spanclass, spantitle, ahref, a
info, Info 1, title, Title for Info 1, http://www.example.com/1, Link to Example 1
...
Thanks!
Upvotes: 0
Views: 3544
Reputation: 9427
<?php
$html = '
<div class="container">
<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->
</div> <!-- /container -->
';
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
$dom_document->preserveWhiteSpace = false;
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//*[@class='info']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "\n[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
Upvotes: 4