Reputation: 5391
Whenever i use curl(php)
to download a page it downloads everything on the page like images, css files or javascript files
. but sometimes i dont want to download these. can i control the resources that i download through curl. i have gone through the manual but i havent found an option that can make this happen? Please dont suggest getting the whole page and then using some regex
magic because that would still download the page and increase load time.
this is a demo code where i download a page from mozilla.com
<?php
$url="http://www.mozilla.com/en-US/firefox/new/";
$userAgent="Mozilla/5.0 (Windows NT 5.1; rv:2.0)Gecko/20100101 Firefox/4.0";
//$accept="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$encoding="gzip, deflate";
$header['lang']="en-us,en;q=0.5";
$header['charset']="ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header['conn']="keep-alive";
$header['keep-alive']=115;
$ch=curl_init();
curl_setopt($ch,CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_ENCODING,$encoding);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_AUTOREFERER,1);
$content=curl_exec($ch);
curl_close($ch);
echo $content;
?>
when i echo the content it shows the images too. i saw in firebug's network tab
that images and external js
files are being downloaded
Upvotes: 0
Views: 2030
Reputation: 360572
PHP's curl only fetches what you tell it to. It doesn't parse html to look for javascript/css <link>
tags and <img>
tags and doesn't fetch them automatically.
If you have curl downloading those resources, then it's your code telling it to do so, and it's up to you to decide what to fetch and what not to. Curl only does what you tell it to.
Upvotes: 1