Reputation:
I need to scrape the data from an html page
<div style="margin-top: 0px; padding-right: 5px;" class="lftFlt1">
<a href="" onclick="setList1(157204);return false;" class="contentSubHead" title="USA USA">USA USA</a>
<div style="display: inline; margin-right: 10px;"><a href="" onclick="rate('157204');return false;"><img src="http://icdn.raaga.com/3_s.gif" title="RATING: 3.29" style="position: relative; left: 5px;" height="10" width="60" border="0"></a></div>
</div>
I need to scrape the "USA USA" and 157204 from the onclick="setList1
...
Upvotes: 0
Views: 1192
Reputation: 3239
I did it this way
$a=$coll->find('div[class=lftFlt1]');
$text=$element->find("a[class=cursor]",0)->onclick;
Upvotes: 0
Reputation: 1
By far the best lib for scraping is simple html dom. basically uses jquery selector syntax.
http://simplehtmldom.sourceforge.net/
The way you'd get the data in this example:
include("simple_html_dom.php");
$dom=str_get_html("page.html");
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->plaintext;
//or
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->title;
Upvotes: 0
Reputation: 22307
You should use DOMDocument or XPath. RegEx is generally not recommended for parsing HTML.
Upvotes: 2
Reputation: 316939
Please go through my previous answers about how to handle HTML with DOM.
XPath to get the Text Content of all anchor elements:
//a/text()
XPath to get the title attribute of all anchor elements:
//a/@title
XPath to get the onclick attribute of all anchor elements:
//a/@onclick
You will have to use some string function to extract the number from the onclick text.
Upvotes: 1
Reputation: 5548
Use regex:
/setList1\(([0-9]+)\)[^>]+title="([^"]+)"/si
and preg_match() or preg_match_all()
Upvotes: 1