Marc
Marc

Reputation: 9527

PHP DOM - How to retrieve nodeValue and href of a specific element

I have a loop through wich I would like to push the nodeValue and href of the first a element of the first div element. I am new to DOM so I read several posts and came up with the following code. Unfortunately this is not working. Hope someone can help me through with that. Thank you in advance for your replies. Cheers. Marc

Here below the html structure of $value (see loop below to understand what is $value)

    <body>
        <div>
           <a href="myhreflink">this is my target</a>
           <a href="somehref">sometext</a>
        </div>

        <div></div>
        <div></div>
        <div></div>
    </body>

Here below my loop:

 myarray = array();  
 /* $content is a DOMNodeList Object and $value is a DOMElement Object */

foreach ($content as $value){

    $firstDiv = $value->getElementsByTagName('div')[0];
    $firstA = $firstDiv->getElementsByTagName('a')[0];

    $val = $firstA->nodeValue;
    $link = $firstA->getAttribute('href');

    array_push($myarray, array('val'=>$val, 'href'=>$link));    
}

-> Here below the full code----------

<?php
header('Content-Type: text/html; charset=utf-8');
mysql_set_charset('utf8'); 
ini_set('display_errors', 1); error_reporting(E_ALL);

$liste = array();


    $url = 'myurl';
    $path = 'mypath'; 
    $titres = print_url_data($url, $path);

    foreach($titres as $value){
        array_push($liste, $value);
    }




// -> functions ------------------------------------------------------------

function print_url_data($url, $path){
    $content = get_url_data($url, $path);
    $tableau = array();
    foreach ($content as $value){

        $firstDiv = $value->getElementsByTagName('div')->item(0);
        $firstA = $firstDiv->getElementsByTagName('a')->item(0);

        $val = $firstA->nodeValue;
        $link = $firstA->getAttribute('href');

        array_push($myarray, array('val'=>$val, 'href'=>$link));  
    }
    return $tableau;
}

function get_url_data($url, $path){
    $xml_content = get_url($url);
    $dom = new DOMDocument();
    @$dom->loadHTML($xml_content);
    $xpath = new DomXPath($dom);
    $content_title = $xpath->query($path);
    return $content_title;
}

function get_url($url){

    $curl = curl_init();

    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // browsers keep this blank.

    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
    curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
    curl_setopt($curl, CURLOPT_REFERER, '[url=http://www.google.com]http://www.google.com[/url]');
    curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_TIMEOUT, 10);


    $html = curl_exec($curl); 
    curl_close($curl);
    return $html; 
}

?>

Upvotes: 0

Views: 4805

Answers (1)

Michael Berkowski
Michael Berkowski

Reputation: 270609

This type of array dereferencing is not valid in PHP prior to the brand new PHP 5.4.

// Can't do this unless running PHP5.4 (which seems unlikely)
// Actually without a way to test, I'm not sure that DOMDocument would even
// suppor this under PHP5.4
$firstDiv = $value->getElementsByTagName('div')[0];
$firstA = $firstDiv->getElementsByTagName('a')[0];

Instead, you need to store and then retrieve the value by index with item().

$firstDiv = $value->getElementsByTagName('div')->item(0);
$firstA = $firstDiv->getElementsByTagName('a')->item(0);

Update

I think I get it now - since $content is a NodeList, you don't want to iterate over it in foreach instead, call getElementsByTagName() on it directly. By iterating over it, you get individual nodes, rather than node lists, and you can't call getElementsByTagName() on individual nodes that I'm aware of.

function print_url_data($url, $path){
    $content = get_url_data($url, $path);
    $tableau = array();

    // No need to loop. get nodes directly from $content   
    $firstDiv = $content->getElementsByTagName('div')->item(0);
    $firstA = $firstDiv->getElementsByTagName('a')->item(0);

    $val = $firstA->nodeValue;
    $link = $firstA->getAttribute('href');

    // this is not going to work since $myarray isn't initialized
    // you might have meant to use $tableau
    array_push($myarray, array('val'=>$val, 'href'=>$link));  

    return $tableau;
}

Upvotes: 1

Related Questions