Regular expression html table extract using PHP function preg_match_all

Question

I want to extract table from html page which contains nested html table tags after that I want to extract and of tables.

I am using this. Its working fine for and

$file = file_get_contents($url);
preg_match_all ("/(.*)/U", $file, $pat_array);
print $pat_array[0][0]." 
 ".$pat_array[0][1]."
";

Can anybody tell me regular expression for nested

some data using and

. Please keep the href if present in the or fields, and keep in mind the needed tables.

Example:

$file = "   asdf      
 (some tr and td data >    "

preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." 
 ".$pat_array[0][1]."
";

Update 1 :

When I tried below code it shows the error:

Notice: Undefined offset: 0 in C:\xampp\htdocs estphp abledata.php on line 27

Code:

$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/(.*)/U", $file, $pat_array);
print $pat_array[1][0];

Can anybody help me regarding this error also?

SBH · Accepted Answer

Don't try to parse HTML with regex, use DOMDocument and DOMXpath instead.

$dom = new DOMDocument();
$dom->loadHtml($file);

$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes

// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
    print $tableNode->nodeValue;
}

There are a lot more query options which you can perform with xpath, have a look here. Also you propably want to do something else with the selected nodes than just printing the content. If you are looking for the sub DOM of each table, try this:

foreach ($tableNodes as $tableNode) {
    $newDom = new DOMDocument();
    $clone = $tableNode->cloneNode(true);
    $clone = $newDom->importNode($clone, true);
    $newDom->appendChild($clone);

    $html = $newDom->saveHTML();
}

Regular expression html table extract using PHP function preg_match_all

Answers (1)

Related Questions