Reputation: 343
I want to extract table from html page which contains nested html table tags after that I want to extract <td>
and <tr>
of tables.
I am using this. Its working fine for <b>
and </b>
$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";
Can anybody tell me regular expression for nested <table (some table properties)>
some data using <tr>
and <td> </table>
. Please keep the href if present in the <tr>
or <td>
fields, and keep in mind the needed tables.
Example:
$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red > <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"
preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";
Update 1 :
When I tried below code it shows the error:
Notice: Undefined offset: 0 in C:\xampp\htdocs\testphp\tabledata.php on line 27
Code:
$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];
Can anybody help me regarding this error also?
Upvotes: 0
Views: 1234
Reputation: 1908
Don't try to parse HTML with regex, use DOMDocument
and DOMXpath
instead.
$dom = new DOMDocument();
$dom->loadHtml($file);
$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes
// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
print $tableNode->nodeValue;
}
There are a lot more query options which you can perform with xpath, have a look here. Also you propably want to do something else with the selected nodes than just printing the content. If you are looking for the sub DOM of each table, try this:
foreach ($tableNodes as $tableNode) {
$newDom = new DOMDocument();
$clone = $tableNode->cloneNode(true);
$clone = $newDom->importNode($clone, true);
$newDom->appendChild($clone);
$html = $newDom->saveHTML();
}
Upvotes: 1