S17514
S17514

Reputation: 285

Read data from HTML table with PHP

Lately I've had a question, what I'm trying to do is read data from an HTML table and grab the data into a variable called $id. For example I have this code:

<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr>

What I want to do is that another variable called $array[$i] which is holding a search query. I want my PHP code to search through the table until it finds the section with that specific query in it. In this case is would be "Party Hat." What I want it to do after it finds the query is for it to look at the ID which is the "td" section above the name "Party Hat" the ID in this case is 413. After this I want the variable $id to hold the ID. How do I do this? Any help would be HIGHLY appreciated!

Upvotes: 2

Views: 12034

Answers (3)

MonkeyMonkey
MonkeyMonkey

Reputation: 836

using Tidy, DOMDocument and DOMXPath (make sure the PHP extensions are enabled) you can do something like this:

<?php
$url = "http://example.org/test.html";

function get_data_from_table($id, $url)
{
    // retrieve the content of that url
    $content = file_get_contents($url);

    // repair bad HTML
    $tidy = tidy_parse_string($content);
    $tidy->cleanRepair();
    $content = (string)$tidy;

    // load into DOM
    $dom = new DOMDocument();
    $dom->loadHTML($content);

    // make xpath-able
    $xpath = new DOMXPath($dom);

    // search for the first td of each tr, where its content is $id
    $query = "//tr/td[position()=1 and normalize-space(text())='$id']";
    $elements = $xpath->query($query);
    if ($elements->length != 1) {
        // not exactly 1 result as expected? return number of hits
        return $elements->length;
    }

    // our td was found
    $element = $elements->item(0);

    // get his parent element (tr)
    $tr = $element->parentNode;
    $data = array();

    // iterate over it's td elements
    foreach ($tr->getElementsByTagName("td") as $td) {
        // retrieve the content as text
        $data[] = $td->textContent;
    }

    // return the array of <td> contents
    return $data;
}

echo '<pre>';
print_r(
    get_data_from_table(
        414,
        $url
    )
);
echo '</pre>';

Your HTML source (http://example.org/test.html):

<table><tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr><tr>
<td>414</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr>

(as you can see, no valid HTML, but this doesn't matter)

Upvotes: 3

Daan Timmer
Daan Timmer

Reputation: 15047

This works: (although a bit ugly, perhaps someone else can come up with a better xpath solution)

$html = <<<HTML
<html>
    <body>
        <table>
            <thead>
                <tr>
                    <td>id</td>
                    <td>name</td>
                    <td>a</td>
                    <td>b</td>
                    <td>c</td>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td>413</td>
                    <td>Party Hat</td>
                    <td>0</td>
                    <td>No</td>
                    <td>a link</td>
                </tr>
                <tr>
                    <td>414</td>
                    <td>Party Hat 2</td>
                    <td>0</td>
                    <td>No</td>
                    <td>a link</td>
                </tr>
            </tbody>
        </table>
    </body>
</html>
HTML;

$doc = new DOMDocument();
$doc->loadHTML($html);
$domxpath = new DOMXPath($doc);

$res = $domxpath->query("//*[local-name() = 'td'][text() = 'Party Hat']/../td[position() = '1']");

var_dump($res->length, $res->item(0)->textContent);

Outputs:

int(1)
string(3) "413"

Upvotes: 2

philipp
philipp

Reputation: 16485

try to load the html into an new DOMDocument via loadHTML and process it like an XML Doc, with xpath or other types of query

Upvotes: 0

Related Questions