Zulakis
Zulakis

Reputation: 8394

php domdocument parse nested tables

I got a table which looks like this: http://pastebin.com/jjZxeNHF

I got it as a PHP-DOMDocument.

Now I want to "parse" this table.

If I am correct, something like the following is not going to work because $superTable->getElementsByTagName('tr') is not only going to get outer tr's but also the inner ones.

foreach ($superTable->getElementsByTagName('tr') as $superRow) {
    foreach ($superRow->getElementsByTagName('td') as $superCol) {
        foreach ($superCol->getElementsByTagName('table') as $table) {
            foreach ($table->getElementsByTagName('tr') as $row) {
                foreach ($row->getElementsByTagName('td') as $col) {
                }
            }
        }
    }
}

How can I go trough all the tables, field by field, as described in the second snippet.

Upvotes: 2

Views: 2169

Answers (2)

cHao
cHao

Reputation: 86575

You could use XPath to eliminate a lot of the blatantly low-level iteration and reduce the apparent complexity of all this...

$xpath = new DOMXPath($document);
foreach ($xpath->query('//selector/for/superTable//table') as $table) {
    // in case you really wanted them...
    $superCol = $table->parentNode;
    $superRow = $superCol->parentNode;

    foreach ($table->getElementsByTagName('td') as $col) {
        $row = $td->parentNode;
        // do your thing with each cell here
    }
}

You could drill down further than this, if you wanted -- if you just wanted every cell in the inner tables, you could reduce it to one loop over //selector/for/superTable//table//td.

Course, if you're dealing with valid HTML, then you could just loop over each element's children as well. It all depends on what the HTML will look like, and exactly what you need from it.

Edit: If you can't use XPath for some reason, you might could do something like

// I assume you've found $superTable already
foreach ($superTable->getElementsByTagName('table') as $table) {
    $superCol = $table->parentNode;
    $superRow = $superCol->parentNode;
    foreach ($table->getElementsByTagName('td') as $col) {
        $row = $col->parentNode;
        // do your thing here
    }
}

Note that neither solution bothers to iterate over the rows etc. That's a big part of what obviates the need to get only rows in the current table. You're only looking for tables within the table, which by definition (1) will be the sub-tables and (2) will be within a column within a row within the main table, and you can get the parent row and column from the table element itself.

Of course, both solutions assume you're only nesting tables one level deep. If it's more than that, you're going to want to look at a recursive solution and DOMElement's childNodes property. Or, a more narrowly focused XPath query.

Upvotes: 1

Zulakis
Zulakis

Reputation: 8394

This is my solution:

foreach ($raumplan->getElementsByTagName('tr') as $superRow) {
    if ($superRow->getElementsByTagName('table')->length > 0) {
        foreach ($superRow->getElementsByTagName('td') as $superCol) {
            if ($superCol->getElementsByTagName('table')->length > 0) {
                foreach ($superCol->getElementsByTagName('table') as $table) {
                    foreach ($table->getElementsByTagName('tr') as $row) {
                        foreach ($row->getElementsByTagName('td') as $col) {
                        }
                    }
                }
            }
        }
    }
}

It checks if you are in the outer table by looking if there is a table contained in the element.

Upvotes: 1

Related Questions