santosh
santosh

Reputation: 343

Regular expression html table extract using PHP function preg_match_all

I want to extract table from html page which contains nested html table tags after that I want to extract <td> and <tr>of tables.

I am using this. Its working fine for <b> and </b>

$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Can anybody tell me regular expression for nested <table (some table properties)> some data using <tr> and <td> </table>. Please keep the href if present in the <tr> or <td> fields, and keep in mind the needed tables.

Example:

$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red >  <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"

preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Update 1 :

When I tried below code it shows the error:

Notice: Undefined offset: 0 in C:\xampp\htdocs\testphp\tabledata.php on line 27

Code:

$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];

Can anybody help me regarding this error also?

Upvotes: 0

Views: 1234

Answers (1)

SBH
SBH

Reputation: 1908

Don't try to parse HTML with regex, use DOMDocument and DOMXpath instead.

$dom = new DOMDocument();
$dom->loadHtml($file);

$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes

// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
    print $tableNode->nodeValue;
}

There are a lot more query options which you can perform with xpath, have a look here. Also you propably want to do something else with the selected nodes than just printing the content. If you are looking for the sub DOM of each table, try this:

foreach ($tableNodes as $tableNode) {
    $newDom = new DOMDocument();
    $clone = $tableNode->cloneNode(true);
    $clone = $newDom->importNode($clone, true);
    $newDom->appendChild($clone);

    $html = $newDom->saveHTML();
}

Upvotes: 1

Related Questions