somethis
somethis

Reputation: 237

Parsing HTML with Indices or Innertext

lease note that my question is specifically targeted at "Simple HTML DOM Library"! There are 3k+ lines of code and I have no interest in using a different parser.

A reference can be found here: "How to find HTML elements" at http://simplehtmldom.sourceforge.net/manual.htm


With the following code I'm trying to extract homepage URLs from various div elements.

Defining the descendant selectors div[...] li a is easy. But to narrow it down to the homepage I tried:

  1. the 6th li element ... through an index (see below, resulting in error "trying to get property of non-object")
  2. that strange label=Internet: into the code

Unfortunately I succeeded at neither one :)

Desired Output

http://www.someurl.com/
http://www.anotherurl.com/

Code that doesn't work

foreach($html->find('div[class=contact-data] li a', 6) as $element_details) {
// variable $html contains the Input listed below 

    // Output $element_details

    }

Input (stored in variable $html)

<div class="contact-data">
    <ul class="plain-list">
    <li>
        Somestreet 18</li>
    <li>
        88888
        Somecity</li>
    <li>
        <label>
        Tel:</label>123/123456</li>
    <li>
        <label>
        Fax:</label>123/123457</li>

    <li>
        <label>
        E-Mail:</label><a href="http://www.somesite.com/de/Service/ContactParam?mail_pnr=000290080" onclick="">Contact</a></li>
    <li>
        <label>
        Internet:</label><a href="http://www.someurl.com/">Homepage</a></li>
    <li>    
        <div style="margin-left: 0px">
        </div></li>
    </ul>
</div>

<div class="contact-data">
    <ul class="plain-list">
    <li>
        Anotherstreet 68</li>
    <li>
        88888
        Anothercity</li>
    <li>
        <label>
        Tel:</label>123/123447</li>
    <li>
        <label>
        Fax:</label>123/123458</li>

    <li>
        <label>
        E-Mail:</label><a href="http://www.anothersite.com/de/Service/ContactParam?mail_pnr=000570030" onclick="">Contact</a></li>
    <li>
        <label>
        Internet:</label><a href="http://www.anotherurl.com/">Homepage</a></li>
    <li>    
        <div style="margin-left: 0px">
        </div></li>
    </ul>
</div>

Upvotes: 1

Views: 150

Answers (1)

Prix
Prix

Reputation: 19528

Tested and working code

<?php
include "simplehtmldom/simple_html_dom.php";

$str = <<<HTML
<div class="contact-data">
    <ul class="plain-list">
    <li>
        Somestreet 18</li>
    <li>
        88888
        Somecity</li>
    <li>
        <label>
        Tel:</label>123/123456</li>
    <li>
        <label>
        Fax:</label>123/123457</li>

    <li>
        <label>
        E-Mail:</label><a href="http://www.somesite.com/de/Service/ContactParam?mail_pnr=000290080" onclick="">Contact</a></li>
    <li>
        <label>
        Internet:</label><a href="http://www.someurl.com/">Homepage</a></li>
    <li>
        <div style="margin-left: 0px">
        </div></li>
    </ul>
</div>

<div class="contact-data">
    <ul class="plain-list">
    <li>
        Anotherstreet 68</li>
    <li>
        88888
        Anothercity</li>
    <li>
        <label>
        Tel:</label>123/123447</li>
    <li>
        <label>
        Fax:</label>123/123458</li>

    <li>
        <label>
        E-Mail:</label><a href="http://www.anothersite.com/de/Service/ContactParam?mail_pnr=000570030" onclick="">Contact</a></li>
    <li>
        <label>
        Internet:</label><a href="http://www.anotherurl.com/">Homepage</a></li>
    <li>
        <div style="margin-left: 0px">
        </div></li>
    </ul>
</div>
HTML;

$html= str_get_html($str);

// Find the divs
foreach($html->find('div[class="contact-data"]') as $div)
{
    // Find the listing at the 6th, 
    // however simplehtmldom counts from 0
    $li = $div->find('ul li', 5);
    // Find the link
    $link = $li->find('a', -1);

    // Test if element exist and print if yes
    if(!is_null($link))
        echo $link->href . "\n";
}

When using the foreach on $html->find you cannot use the index or it assumes its a single item.

So first we foreach the divs, then the single li (which was supposed to be at the 6th but it seems it counts from 0 so its on the 5th) from there we find the link and check if it is null or not if null no anchor was found if not we print it.

Output is:

http://www.someurl.com/
http://www.anotherurl.com/

And if u want you can resume it to:

$link = $div->find('ul li', 5)->find('a', -1);

Upvotes: 2

Related Questions