Timothée HENRY
Timothée HENRY

Reputation: 14604

PHP DOM parsing to get to elements inside specific div id

I have some html as follows:

<div id="tvcap">
    <div class="c" id="tads">
        <ol>
            <li>
                <div class="vsc vsta">
                    <h3>
                        <a id="pa1" href="">
                        </a>
                        <a id="vpa1" href="http://www.link1.com">
                        Link 1 Text 1</a>
                    </h3>

                    <div>
                        <div class="kv kva">
                            <cite>
                            www.link1.com</cite>
                        </div>
                    </div>

                    <span class="ac">Link 1 Text2</span>
                </div>
            </li>

            <li>
                <div class="vsc vsta">
                <h3>
                <a id="pa2" href="">
                </a>
                <a id="vpa2" href="http://www.link2.com">Link 2 Text 1</a>
                </h3>

                <div>
                    <div class="kv kva">
                    <cite>www.link2.com</cite>
                    </div>
                </div>

                <span class="ac">Link 2 Text 3</span>
                <div>
                <div class="oslk">
                </div>
                </div>
                </div>
            </li>
        </ol>
    </div>
</div>

Potentially there will be an unknown number of the links&texts, and I wish to iterate and be able to get to each link and text.

I am using the simple html dom parser.

I cannot find the command to get to the div id 'vpa1'.

I tried this, but it returns nothing:

foreach($html->find('a') as $element) 
 if ($element->id == "vpa1") echo $element->href . '<br>';

How can I get to each link and text based on the id being vpa[$i] (vpa1, vpa2, etc).

Upvotes: 1

Views: 10727

Answers (3)

Amal Murali
Amal Murali

Reputation: 76656

Function to extract the contents from a specific div id from any webpage

The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.

function getHTMLByID($id, $html) {
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    $node = $dom->getElementById($id);
    if ($node) {
        return $dom->saveXML($node);
    }
    return FALSE;
}

$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.

Usage example:

$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);

Output:

The world's most popular open source database

Upvotes: 5

Timoth&#233;e HENRY
Timoth&#233;e HENRY

Reputation: 14604

What worked for me was to first find the div with the specified id string using the following command (which uses the simple dom parser):

$div = $html->find('div#'.$divId)

and then to use the div variable to access more things inside.

Upvotes: 1

Vadim
Vadim

Reputation: 642

as @Wrikken said, Xpath will be not fast, but simple solution.

Here code, which you can use as start point:

        $some_html = file_get_contents('some_html.html'); // i put your html into some_html.html file   

        $doc = new DOMDocument();    
        $doc->loadHtml($some_html); // $some_html should contain your html string    

        $xpath = new DOMXPath($doc);    

        //process with no bugs    
        $result = $xpath->query('//*[@id="vpa1"]');    

        var_dump($result);    

        if (!empty($result)) {    
                foreach ( $result as $link ) {    
                        var_dump($link->nodeValue);    
                }    
        } 
        // output  
        // object(DOMNodeList)#4 (1) { ["length"]=> int(1) } string(38) " Link 1 Text 1"

        $result = $xpath->query('//a');    
        var_dump($result);    

        if (!empty($result)) {    
                foreach ( $result as $link ) {    
                        var_dump($link->nodeValue);    
                }    
        }    
        // output
        // object(DOMNodeList)#8 (1) { ["length"]=> int(4) } string(25) " " string(38) " Link 1 Text 1" string(17) " " string(13) "Link 2 Text 1" 

Upvotes: 3

Related Questions