Reputation: 522
In my code I want to extract all links and their text from my old website I am successful to do it but the problem is somewhere I have used ol>li
tags and somewhere I used ul>li
tags inside table and I have about 400 different pages I can extract all the links but I have to change ol
to ul
every time so the easiest and time saving way for me to extract links and their text from all pages is to define that specific <table>
which contains links but when I define <table>
it also extract links from all others from other tables which I don't want
Table Structure to target that contains ol>li
or ul>li
tags
<table style="width:850px;" cellspacing="0" cellpadding="1" border="3">
<tbody>
<tr>
<td style="text-align: center; background-color: rgb(51, 51, 204);">
<h1>My Links</h1>
</td>
</tr>
<tr>
<td>
<ol>
<li><a href="http://websitelink.com/page1.php">Page 1</a></li>
<li><a href="http://websitelink.com/page2.php">Page 2</a></li>
<li><a href="http://websitelink.com/page3.php">Page 3</a></li>
<li><a href="http://websitelink.com/page4.php">Page 4</a></li>
</ol>
...
<ul>
<li><a href="http://websitelink.com/a.php">Page A</a></li>
<li><a href="http://websitelink.com/b.php">Page B</a></li>
<li><a href="http://websitelink.com/c.php">Page C</a></li>
<li><a href="http://websitelink.com/d.php">Page D</a></li>
</ul>
</td>
</tr>
</tbody>
</table>
My Current PHP Code
$html = file_get_contents('http://mywebsitelink.com/pagename.html');
$dom = new DOMDocument;
@$dom->loadHTML($html);
$oltags = $dom->getElementsByTagName('ol'); // I have to change between ul and ol instead of this I can define table
foreach ($oltags as $list){
$links = $list->getElementsByTagName('a');
foreach ($links as $href){
$text = $href->nodeValue;
$href = $href->getAttribute('href');
if(!empty($text) && !empty($href)) {
echo "Link Title: " . $text . " Location: " . $href . "<br />";
}
}
}
Upvotes: 0
Views: 597
Reputation: 445
$html = file_get_contents('http://mywebsitelink.com/pagename.html');
$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$thetags = $xpath->query('//table/tbody/tr/td/ol/li/a|//table/tbody/tr/td/ul/li/a');
foreach($thetags as $onetag)
{
$links = $onetag->getElementsByTagName('a');
foreach ($links as $onelink){
$text = $onelink->nodeValue;
$href = $onelink->getAttribute('href');
if(!empty($text) && !empty($href)) {
echo "Link Title: " . $text . " Location: " . $href . "<br />";
}
}
}
[...]
Upvotes: 0
Reputation: 15141
You can try this one. Here we are using DOMDocument
and doing DOMXPath
query over anchors
present in li
XPath
query//table/tbody/tr/td/ol/li/a|//table/tbody/tr/td/ul/li/a
here we are searching for//table/tbody/tr/td/ol/li/a
or//table/tbody/tr/td/ul/li/a
with|
operator.
$links=array();
$domDocument = new DOMDocument();
$domDocument->loadHTML($string);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//table/tbody/tr/td/ol/li/a|//table/tbody/tr/td/ul/li/a"); //querying domdocument
foreach($results as $result)
{
$links[]=$result->getAttribute("href");//gathering href attribute
}
print_r($links);
Upvotes: 1