Jimmy Spanny
Jimmy Spanny

Reputation: 11

Parsing HTML to find certain elements in PHP

I am using CURL to retrieve a page and store the HTML. I do this successfully and end up with a variable containing HTML similar to this (the content in td is not the same and always changes):

html code above....

   <tr class="myclass">
     <td>Dynamic Content One</td>
     <td>Dynamic Content Two</td>
     <td>Dynamic Content Three</td>
   </tr>

   <tr class="myclass">
     <td>Dynamic Content One</td>
     <td>Dynamic Content Two</td>
     <td>Dynamic Content Three</td>
   </tr>

   More of the same <tr> ......

html code below....

My Goal now is to have parse the html and have an associative array called result() which stores all the <tr> as elements, the array should look like this:

$result[0]["first_content"] = "Dynamic Content One"
$result[0]["second_content"] = "Dynamic Content Two"
$result[0]["third_content"] = "Dynamic Content Three"

$result[1]["first_content"] = "Dynamic Content One"
$result[1]["second_content"] = "Dynamic Content Two"
$result[1]["third_content"] = "Dynamic Content Three"

.. more elements in array depending on how many <tr> there was

I found it quiet tricky to parse something like this. I have used DOMdocument Module and DOMXpath module but all I have achieved is having an array containing elements for each <td> and not sure where I put the algorithms to store them into arrays. Perhaps there is a better way to do it? Here is my current code:

$dom = new DOMDocument;
        @$dom -> loadHTML($retrievedHtml);

        $xPath = new DOMXpath($dom);

        $xPathQuery = "//tr[@class='myclass']";
        $elements = $xPath -> query($xPathQuery);

        if(!is_null($elements)){

            $results = array();

            foreach($elements as $element){

                $nodes = $element -> childNodes;

                print $nodes -> nodeValue;

                foreach($nodes as $node){
                    $results[] = $node -> nodeValue;
                }

            }

Upvotes: 1

Views: 1208

Answers (1)

Professor Abronsius
Professor Abronsius

Reputation: 33804

To achieve the structure of the output array ( minus the textual keys like "first_content" etc ) then for every row add a new dimension to the array and populate that dimension. I think this is what you were trying to achieve anyway!

$dom = new DOMDocument;
@$dom->loadHTML( $retrievedHtml );

$xPath = new DOMXpath($dom);

$xPathQuery = "//tr[@class='myclass']";
$elements = $xPath -> query( $xPathQuery );

if( !is_null( $elements ) ){

    $results = array();

    foreach( $elements as $index => $element ){

        $nodes = $element -> childNodes;

        foreach( $nodes as $subindex => $node ){
            /* Each table row is assigned in new level in array using $index */
            if( $node->nodeType == XML_ELEMENT_NODE ) $results[ $index ][] = $node->nodeValue;
        }
    }

    echo '<pre>',print_r( $results, true ),'</pre>';
}

Upvotes: 1

Related Questions