user3535762
user3535762

Reputation: 1

Fetch html content after page is fully loaded using curl

I am having a bit of problem here. When i load the page it takes at least 10sec to display the complete result. When i use the curl it only displays the html content of the page on runtime. I want the curl to wait at least 10 sec to fetch the complete result. This is my code,

<?php

$cookie = tmpfile();
$userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ;

$ch = curl_init('http://filippo.io/Heartbleed/#www.example.com:433');

$options = array(
    CURLOPT_CONNECTTIMEOUT => 20 , 
    CURLOPT_USERAGENT => $userAgent,
    CURLOPT_AUTOREFERER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_COOKIEFILE => $cookie,
    CURLOPT_COOKIEJAR => $cookie ,
    CURLOPT_SSL_VERIFYPEER => 0 ,
    CURLOPT_SSL_VERIFYHOST => 0
);

curl_setopt_array($ch, $options);
$kl = curl_exec($ch);
curl_close($ch);
echo $kl;
?>

Kindly tell me where i m making mistake & what can i remove or add to make the script working. Thanks

Upvotes: 0

Views: 7302

Answers (3)

Clarenceli
Clarenceli

Reputation: 31

I also had the same problem. But CURLOPT_CONNECTTIMEOUT is the value which means if the curl cannot connect to file in this time, it would give up. And CURLOPT_TIMEOUT is the value which means if the curl cannot crawl this file in this period after connect, it will give up. So there is no value to set the curl function crawl after specific second. You could use JavaScript code to load this file into the window and crawl them. Or you could use the Python web driver

Upvotes: 0

Aleks G
Aleks G

Reputation: 57306

I'm not sure where you got the 10 seconds from. On my mid-range Linux laptop the page took about 3 seconds to load in Firefox. However what you are confusing is the time it takes for the HTML page to load vs. the time it takes for all additional/dynamic content to load.

When you hit the URL, you get a very small static HTML page along with some javascript, css, images, etc. The delay you see is the time it takes for the javascript to execute AJAX requests, as well as for images to load competely.

If you use curl, you are only getting the static HTML - and nothing else along with it. No delay will help you get the full information, unless you are planning on implementing the full javascript engine and HTML parser and then load all the other resources, executing javascript code as necessary.

I strongly advise you to rethink your approach.

Upvotes: 2

Klemen Tusar
Klemen Tusar

Reputation: 9689

Add this to your $options array: CURLOPT_TIMEOUT => 10 where 10 is the number of seconds you want it to wait before timing out.

http://altafphp.blogspot.com/2012/12/difference-between-curloptconnecttimeou.html

Upvotes: 0

Related Questions