Effective way of fetching web pages

Question

I have to fetch multiple web pages, let's say 100 to 500. Right now I am using curl to do so.

function get_html_page($url) {
    //create curl resource
    $ch = curl_init();

    //set url
    curl_setopt($ch, CURLOPT_URL, $url);

    //return the transfer as a string
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, FALSE);

    //$output contains the output string
    $html = curl_exec($ch);

    //close curl resource to free up system resources
    curl_close($ch);

    return $html;
 }

My major concern is the total time taken by my script to fetch all these web pages. I know that the time taken is directly proportional to my internet speed and hence the majority time is taken by $html = curl_exec($ch); function call.

I was thinking that instead of creating and destroying curl instance again and again for each and every web page, if I create it only once and then just reuse it for each and every page and finally in the end destroy it. Something like:

    

    .
    .
    .
    //close curl resource to free up system resources
    curl_close($ch);
?>

Will it make any significant difference in the total time taken? If there is any other better approach then please let me know about it also?

JAL · Accepted Answer

How about trying to benchmark it? It may be more efficient to do it the second way, but I don't think it will add up to much. I'm sure your system can create and destroy curl instances in microseconds. It has to initiate the same HTTP connections each time either way, too.

If you were running many of these at the same time and were worried about system resources, not time, it might be worth exploring. As you noted, most of the time spent doing this will be waiting for network transfers, so I don't think you'll notice a change in overall time with either method.

Effective way of fetching web pages

Answers (2)

Related Questions