Reputation: 591
HTML from site
<ul id="blahlist">
<li><a href="http://blahblah.com">blah blah</a></li>
<li><a href="http://blahblah2.com">blah blah 2</a></li>
......
</ul>
my code
$dom = new simple_html_dom();
$dom->load_file( "blah.html" );
$div_category = $dom->find("#blahlist");
foreach ($div_category as &$ul){
$a_list = $ul->find("a");
foreach ( $a_list as &$anchor){
$csv_array=array($anchor->plaintext, $anchor->getAttribute("href") );
fputcsv($csv_out, $csv_array);
print_r($anchor);
}
the problem is it only show the first row(first line) and not showing the rest of the list within blahlist. Am I doing something wrong? something to do with <li>
that might have stopped after the first line?
Upvotes: 1
Views: 723
Reputation: 20675
Scrape using regular expressions:
$html = <<<EOF
<ul id="blahlist">
<li><a href="http://blahblah.com">blah blah</a></li>
<li><a href="http://blahblah2.com">blah blah 2</a></li>
<li><a href="http://blahblah2.com">blah blah 3</a></li>
<li><a href="http://blahblah2.com">blah blah 4</a></li>
</ul>
EOF;
$ul_id = "blahlist";
if (preg_match("#<ul[^<>]+id=[\"']?{$ul_id}[\"']?[^<>]*>([\s\S]+?)</ul>#i", $html, $match))
{
$lis = $match[1];
preg_match_all("#<li[^<>]*>\s*<a[^<>]+href=[\"']?([^<>\"']+)[\"']?[^<>]*>([\s\S]+?)</a>#i", $lis, $matches);
foreach ($matches[1] as $k => $href) {
$href = strip_tags($href);
$text = strip_tags($matches[2][$k]);
print "$text [$href]<br>";
}
}
You just edit the id of the ul list on this line:
$ul_id = "blahlist";
Result:
blah blah [http://blahblah.com]
blah blah 2 [http://blahblah2.com]
blah blah 3 [http://blahblah2.com]
blah blah 4 [http://blahblah2.com]
Upvotes: 1
Reputation: 3721
How about
$dom->find("#blahlist li");
It is to "grab" all li
s under #blahlist
.
Upvotes: 1