Ram
Ram

Reputation:

scrape the data from html page php

I need to scrape the data from an html page

<div style="margin-top: 0px; padding-right: 5px;" class="lftFlt1">

    <a href="" onclick="setList1(157204);return false;" class="contentSubHead" title="USA USA">USA USA</a>
    <div style="display: inline; margin-right: 10px;"><a href="" onclick="rate('157204');return false;"><img src="http://icdn.raaga.com/3_s.gif" title="RATING: 3.29" style="position: relative; left: 5px;" height="10" width="60" border="0"></a></div>
    </div>

I need to scrape the "USA USA" and 157204 from the onclick="setList1...

Upvotes: 0

Views: 1192

Answers (5)

Kaletha
Kaletha

Reputation: 3239

I did it this way

$a=$coll->find('div[class=lftFlt1]');
$text=$element->find("a[class=cursor]",0)->onclick;

Upvotes: 0

steve
steve

Reputation: 1

By far the best lib for scraping is simple html dom. basically uses jquery selector syntax.

http://simplehtmldom.sourceforge.net/

The way you'd get the data in this example:

include("simple_html_dom.php");
$dom=str_get_html("page.html");
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->plaintext;
//or 
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->title;

Upvotes: 0

Shubham
Shubham

Reputation: 22307

You should use DOMDocument or XPath. RegEx is generally not recommended for parsing HTML.

Upvotes: 2

Gordon
Gordon

Reputation: 316939

Please go through my previous answers about how to handle HTML with DOM.

XPath to get the Text Content of all anchor elements:

//a/text()

XPath to get the title attribute of all anchor elements:

//a/@title

XPath to get the onclick attribute of all anchor elements:

//a/@onclick

You will have to use some string function to extract the number from the onclick text.

Upvotes: 1

Piotr M&#252;ller
Piotr M&#252;ller

Reputation: 5548

Use regex:

/setList1\(([0-9]+)\)[^>]+title="([^"]+)"/si

and preg_match() or preg_match_all()

Upvotes: 1

Related Questions