Sefam
Sefam

Reputation: 1773

Getting tags in DOMDocument

I'm trying to get the HTML markup of a table in a page:

$new_dom = new DOMDocument();

$table = '';

$nodesTable = $this->dom->getElementsbyTagName("table");

foreach($nodesTable as $nodeTable){
    $color = $nodeTable->getAttribute('bordercolordark');
    if ($color == '#73BAFF') {
        $table = $nodeTable;
    }
}

$new_dom->appendChild($table);

echo $new_dom->saveHTML();

Here is somepage.html:

<html>
<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

<table border="1" cellpadding="0" width="500" bordercolorlight="#ACD6FF" bordercolordark="#73BAFF" align="center">
    <tr>
        <td rowspan="2" colspan="2" bgcolor="#73BAFF"> </td>
        <td colspan="3" align="center" bgcolor="#ACD6FF"> Element 1 </td>
        <td colspan="3" align="center" bgcolor="#ACD6FF"> Element 2 </td>
    </tr>
    <tr>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 1</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 2</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 3</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
    </tr>
</table>

<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

</html>

$new_dom just outputs \n instead of HTML markup. I tried looking at this answer: PHP DOMDocument stripping HTML tags, but appending the table this way didn't work either.

Upvotes: 0

Views: 107

Answers (1)

inf3rno
inf3rno

Reputation: 26139

Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error' 

So you cannot move nodes from one document to another... If you want to do that, you have to use importNode() with the deep flag.

$dom = new DOMDocument();
$dom->loadHTMLFile('x.html');
$new_dom = new DOMDocument();

$table = '';

$nodesTable = $dom->getElementsbyTagName("table");

foreach($nodesTable as $nodeTable){
    $color = $nodeTable->getAttribute('bordercolordark');
    if ($color == '#73BAFF') {
        $table = $new_dom->importNode($nodeTable, true);
    }
}

$new_dom->appendChild($table);

echo $new_dom->saveHTML();

This imports only the table element, but not the children...

note: I'd disable the entity loader in your case libxml_disable_entity_loader(true);. I am not sure whether XEE attacks work with loadHTML() too, but just for the sake of security.

Upvotes: 2

Related Questions