user2963178
user2963178

Reputation: 5

curl_multi_exec returning blanks sometimes

Basically there are a couple hundred subpages I'm pulling off a site (as a test run), and then I have to parse each of those couple hundred subpages for some data. Now all this is working and fine. But of course, it takes too long because there are so many pages, if I did this in serial. So I used curl_multi_exec, but now I'm running into the problem where some of those pages will return blank. Which pages are blank is quite random so I'm assuming it has to do with the web server deciding not to respond given that I'm spamming it with 200 requests at once. Is there a way to either limit the number of requests at once or have curl redo the request if it didn't return properly, or otherwise deal with this problem?

Existing curl code:

function multiple_html_requests($nodes){
    $mh = curl_multi_init();
    $curl_array = array();
    foreach ($nodes as $i=>$url){
        $curl_array[$i] = curl_init($url);
        curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
        curl_multi_add_handle($mh, $curl_array[$i]);
    }
    $running = NULL;
    do{
      usleep(10000);
      curl_multi_exec($mh, $running);
    } while($running > 0);

    $res = array();
    foreach($nodes as $i=>$url){
        $res[$url] = curl_multi_getcontent($curl_array[$i]);
    }

    foreach($nodes as $i=>$url){
        curl_multi_remove_handle($mh, $curl_array[$i]);
    }
    curl_multi_close($mh);
    return $res;
}

Upvotes: 0

Views: 1144

Answers (1)

Fabiano Taioli
Fabiano Taioli

Reputation: 5540

You can use this class:

https://github.com/petewarden/ParallelCurl

Is a layer over curl multi and support setting maximum number of threads

Upvotes: 1

Related Questions