cola
cola

Reputation: 12466

How can i download and parse portion of web page?

I don't want to download the whole web page. It will take time and it needs lot of memory.

How can i download portion of that web page? Then i will parse that.

Suppose i need to download only the <div id="entryPageContent" class="cssBaseOne">...</div>. How can i do that?

Upvotes: 2

Views: 2776

Answers (2)

Marc B
Marc B

Reputation: 360702

You can't download a portion of a URL by "only this piece of HTML". HTTP only supports byte ranges for partial downloads and has no concept of HTML/XML document trees.

So you'll have to download the entire page, load it into a DOM parser, and then extract only the portion(s) you need.

e.g.

$html = file_get_contents('http://example.com/somepage.html');
$dom = new DOM();
$dom->loadHTML($html);
$div = $dom->getElementById('entryPageContent');

$content = $div->saveHTML();

Upvotes: 5

kuba
kuba

Reputation: 7389

Using this:

curl_setopt($ch, CURLOPT_RANGE, "0-10000");

will make cURL download only the first 10k bytes of the webpage. Also it will only work if the server side supports this. Many interpreted scripts (CGI, PHP, ...) ignore it.

Upvotes: 0

Related Questions