zhtway
zhtway

Reputation: 283

Parsing HTML in PHP

I have problem with parsing HTML with DOM in PHP. I want to retrieve href value but giving me error. I want row value and href value together in a two dimensional array. The last line in code also give error too. Any Idea ? The output I want is --
1,"http://.....",User
2,"http://..... ",Server ...etc in 2D array.

<html>
<body>
    <table>
        <tbody>
            <tr>
                <td>1 </td>
                <td><a href="http://www.abcd.net"></a></td>
                <td>User</td>
            </tr>
            <tr>
                <td>2 </td>
                <td><a href="http://www.def.net"></a></td>
                <td>Server</td>
            </tr>
        </tbody>
    </table>
  </body>
   </html> 

Here is PHP Code

$resArr = array();

$dom = new domDocument;
@$dom -> loadHTML(file_get_contents($link));
$dom -> preserveWhiteSpace = false;

$linkt = $dom -> getElementsByTagName('table');
$linkt1 = $linkt -> item(2);

//tr
foreach ($linkt1 -> childNodes as $key => $tag){
    //td
    foreach ($tag -> childNodes as $key1 => $tag1){

        foreach ($tag1 -> childNodes as $key2 => $tag2){
             echo $tag2->hasattribute('href');
                      //Error Occur here ----Fatal error: Call to 
                      //undefined method DOMText::hasattribute() in on line 38
        }
    }
}

$resArr[$i][0] = $tag -> childNodes -> item(0) -> nodeValue;
$resArr[$i][3] = $tag -> childNodes -> item(3) -> nodeValue;
$resArr[$i][1] = $tag1 -> childNodes -> item(1) -> 
  childNodes -> item(0) -> getAttribute('href'); //the same error as above

Upvotes: 1

Views: 1784

Answers (1)

Francis Avila
Francis Avila

Reputation: 31621

I don't know exactly what output you want, but I'm pretty sure this is an XPath problem. Something like this?

// Your sample html is stored in $html as a string
libxml_use_internal_errors(false);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_use_internal_errors(true);

$xp = new DOMXPath($dom);

$rows = $xp->query('/html/body/table/tbody/tr');

$resArr = array();
foreach ($rows as $row) {
    $resArr[] = array(
        $xp->evaluate('string(td[1])', $row),
        $xp->evaluate('string(td[2]/a/@href)', $row),
        $xp->evaluate('string(td[3])', $row),
    );
}

var_dump($resArr);

The output from this code:

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(2) "1 "
    [1]=>
    string(19) "http://www.abcd.net"
    [2]=>
    string(4) "User"
  }
  [1]=>
  array(3) {
    [0]=>
    string(2) "2 "
    [1]=>
    string(18) "http://www.def.net"
    [2]=>
    string(6) "Server"
  }
}

Upvotes: 3

Related Questions