salep
salep

Reputation: 1390

PHP dom parsing

I'm trying to get the values of the following table. I tried both curl/regex (I know it's not recommended) and DOM separately, but wasn't able to get the values properly.

There are multiple rows in the page, so I'll need to use a foreach. I need an exact match of the structure below.

<tr>
    <td width="75" style="NS">
        <img src="NS" width="64" alt="INEEDTHISVALUE">
    </td>
    <td style="NS">
        <a href="NS">NS</a>
    </td>
    <td style="NS">INEEDTHISVALUETOO</td>
</tr>

NS = Non-static values. They change for each td and a since it's a colored (inline css) table. They may contain special characters like ; / or numbers/alphabetical characters.

I'm using simple_html_dom class which can be found here : http://htmlparsing.com/php.html

I'm using the code below to get all td's, but I need more specific output (I included the table row above)

What I've tried so far :

$html = file_get_html("URL");
foreach($html->find('td') as $td) {
    echo $td."<br>";
}

REGEX & CURL

$site = "URL";
$ch = curl_init();
$hc = "YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; Yahoo! Search - Web Search)";
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_URL, $site);
curl_setopt($ch, CURLOPT_USERAGENT, $hc);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$site = curl_exec($ch);
curl_close($ch);
preg_match_all('@<tr><td width="75" style="(.*?)"><img src="/folder/link/(.*?)" width="64" alt="(.*?)"></td><td style="(.*?)"><a href="/folder2/link2/(.*?)">(.*?)</a></td><td style="(.*?)">(.*?)</td></tr>@', $site, $arr);
var_dump($arr); // returns empty array, WHY?

Upvotes: 0

Views: 242

Answers (1)

Rudi
Rudi

Reputation: 2995

You can do it like this without a library:

$results = array();
$doc = new DOMDocument();
$doc->loadHTML($site);
$xpath = new DOMXPath($doc);

foreach ($xpath->query('//tr') as $tr) {
    $results[] = array(
        'img_alt' => $xpath->query('td[1]/img', $tr)->item(0)->getAttribute('alt'),
        'td_text' => $xpath->query('td[last()]', $tr)->item(0)->nodeValue
    );
}

print_r($results);

It will give you:

Array
(
    [0] => Array
        (
            [img_alt] => INEEDTHISVALUE 1
            [td_text] => INEEDTHISVALUETOO 1
        )

    [1] => Array
        (
            [img_alt] => INEEDTHISVALUE 2
            [td_text] => INEEDTHISVALUETOO 2
        )

)

Relevant documentation: PHP: DOMXPath::query

Upvotes: 1

Related Questions