Reputation: 1077
I'm trying to learn how to get data out of a page with php, I can see how to get everything between tags, but is there a way to get the content of tags within tags?
In the html below,how would I get access to the content of one of the bold spans, the second one for example?
<html>
<div class="padding10">
<span class="bold"></span>
<span class="bold"></span>
<span class="bold"></span>
<span class="bold"></span>
</div>
</html>
I tried the following, which allows me to get the content of the padding10 div but I don't know how to go any further to get the bold spans. Everything I've tried doesn't work.
//gets all
$file_string = file_get_contents('http://www.test.com/index.html');
//gets all in padding10 div
preg_match('/<div class="padding10">(.*)<\/div>/si', $file_string, $padding_10);
//gets all bold spans on padding10 div??
preg_match_all('/<span class="bold">(.*)<\/span>/i', $padding_10[1], $spans_10);
I'm starting to realise from what I'm reading that this is probably a wrong or inefficient way to be going about this but any help would be great. Thanks.
Upvotes: 1
Views: 696
Reputation: 5357
Maybe phpQuery could help?
"a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library." This will allow you to select stuff from a parsed HTML document. This may be better-suited to HTML parsing/traversing than doing regexes "by hand".
http://code.google.com/p/phpquery/
Upvotes: 2