Reputation: 237
lease note that my question is specifically targeted at "Simple HTML DOM Library"! There are 3k+ lines of code and I have no interest in using a different parser.
A reference can be found here: "How to find HTML elements" at http://simplehtmldom.sourceforge.net/manual.htm
With the following code I'm trying to extract homepage URLs from various div
elements.
Defining the descendant selectors div[...] li a
is easy. But to narrow it down to the homepage I tried:
li
element ... through an index (see below, resulting in error "trying to get property of non-object")label=Internet:
into the codeUnfortunately I succeeded at neither one :)
Desired Output
http://www.someurl.com/
http://www.anotherurl.com/
Code that doesn't work
foreach($html->find('div[class=contact-data] li a', 6) as $element_details) {
// variable $html contains the Input listed below
// Output $element_details
}
Input (stored in variable $html)
<div class="contact-data">
<ul class="plain-list">
<li>
Somestreet 18</li>
<li>
88888
Somecity</li>
<li>
<label>
Tel:</label>123/123456</li>
<li>
<label>
Fax:</label>123/123457</li>
<li>
<label>
E-Mail:</label><a href="http://www.somesite.com/de/Service/ContactParam?mail_pnr=000290080" onclick="">Contact</a></li>
<li>
<label>
Internet:</label><a href="http://www.someurl.com/">Homepage</a></li>
<li>
<div style="margin-left: 0px">
</div></li>
</ul>
</div>
<div class="contact-data">
<ul class="plain-list">
<li>
Anotherstreet 68</li>
<li>
88888
Anothercity</li>
<li>
<label>
Tel:</label>123/123447</li>
<li>
<label>
Fax:</label>123/123458</li>
<li>
<label>
E-Mail:</label><a href="http://www.anothersite.com/de/Service/ContactParam?mail_pnr=000570030" onclick="">Contact</a></li>
<li>
<label>
Internet:</label><a href="http://www.anotherurl.com/">Homepage</a></li>
<li>
<div style="margin-left: 0px">
</div></li>
</ul>
</div>
Upvotes: 1
Views: 150
Reputation: 19528
Tested and working code
<?php
include "simplehtmldom/simple_html_dom.php";
$str = <<<HTML
<div class="contact-data">
<ul class="plain-list">
<li>
Somestreet 18</li>
<li>
88888
Somecity</li>
<li>
<label>
Tel:</label>123/123456</li>
<li>
<label>
Fax:</label>123/123457</li>
<li>
<label>
E-Mail:</label><a href="http://www.somesite.com/de/Service/ContactParam?mail_pnr=000290080" onclick="">Contact</a></li>
<li>
<label>
Internet:</label><a href="http://www.someurl.com/">Homepage</a></li>
<li>
<div style="margin-left: 0px">
</div></li>
</ul>
</div>
<div class="contact-data">
<ul class="plain-list">
<li>
Anotherstreet 68</li>
<li>
88888
Anothercity</li>
<li>
<label>
Tel:</label>123/123447</li>
<li>
<label>
Fax:</label>123/123458</li>
<li>
<label>
E-Mail:</label><a href="http://www.anothersite.com/de/Service/ContactParam?mail_pnr=000570030" onclick="">Contact</a></li>
<li>
<label>
Internet:</label><a href="http://www.anotherurl.com/">Homepage</a></li>
<li>
<div style="margin-left: 0px">
</div></li>
</ul>
</div>
HTML;
$html= str_get_html($str);
// Find the divs
foreach($html->find('div[class="contact-data"]') as $div)
{
// Find the listing at the 6th,
// however simplehtmldom counts from 0
$li = $div->find('ul li', 5);
// Find the link
$link = $li->find('a', -1);
// Test if element exist and print if yes
if(!is_null($link))
echo $link->href . "\n";
}
When using the foreach
on $html->find
you cannot use the index or it assumes its a single item.
So first we foreach
the div
s, then the single li
(which was supposed to be at the 6th but it seems it counts from 0 so its on the 5th) from there we find the link and check if it is null
or not if null
no anchor
was found if not we print it.
Output is:
http://www.someurl.com/
http://www.anotherurl.com/
And if u want you can resume it to:
$link = $div->find('ul li', 5)->find('a', -1);
Upvotes: 2