user1647347
user1647347

Reputation: 507

Simultaneous HTTP requests in PHP with cURL

I'm trying to take a rather large list of domains query the rank of each using the compete.com API as seen here -> https://www.compete.com/developer/documentation

The script I wrote takes a database of domains I populated and initiates a cURL request to compete for the rank of the website. I quickly realized that this was very slow because each request was being sent one at a time. I did some searching and came across this post-> http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/ which explains how to perform simultaneous HTTP requests in PHP with cURL.

Unfortunately that script will take an array of 25,000 domains and try to process them all at once. I found that batches of 1,000 work quite well.

Any idea how to send 1,000 queries to compete.com then wait for completion and send the next 1,000 until the array is empty? Here's what I'm workin with thus far:

<?php

//includes
include('includes/mysql.php');
include('includes/config.php');

//get domains
$result = mysql_query("SELECT * FROM $tableName");
while($row = mysql_fetch_array($result)) {
    $competeRequests[] = "http://apps.compete.com/sites/" . $row['Domain'] . "/trended/rank/?apikey=xxx&start_date=201207&end_date=201208&jsonp=";
}

//first batch
$curlRequest = multiRequest($competeRequests);
$j = 0;
foreach ($curlRequest as $json){
    $j++;
    $json_output = json_decode($json, TRUE);
    $rank = $json_output[data][trends][rank][0][value];

    if($rank) {
        //Create mysql query
        $query = "Update $tableName SET Rank = '$rank' WHERE ID  = '$j'";

        //Execute the query
        mysql_query($query);
        echo $query . "<br/>";
    }
}


function multiRequest($data) {
  // array of curl handles
  $curly = array();
  // data to be returned
  $result = array();

  // multi handle
  $mh = curl_multi_init();

  // loop through $data and create curl handles
  // then add them to the multi-handle
  foreach ($data as $id => $d) {

    $curly[$id] = curl_init();

    $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
    curl_setopt($curly[$id], CURLOPT_URL,            $url);
    curl_setopt($curly[$id], CURLOPT_HEADER,         0);
    curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);

    // post?
    if (is_array($d)) {
      if (!empty($d['post'])) {
        curl_setopt($curly[$id], CURLOPT_POST,       1);
        curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
      }
    }

    curl_multi_add_handle($mh, $curly[$id]);
  }

  // execute the handles
  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while($running > 0);

  // get content and remove handles
  foreach($curly as $id => $c) {
    $result[$id] = curl_multi_getcontent($c);
    curl_multi_remove_handle($mh, $c);
  }

  // all done
  curl_multi_close($mh);

  return $result;

}
?>

Upvotes: 2

Views: 3191

Answers (2)

elixon
elixon

Reputation: 1292

https://github.com/webdevelopers-eu/ShadowHostCloak

This does exactly what you want. Just pass empty argument to new Proxy() to bypass proxy and make direct requests.

You can stuff 1000 requests in it and call $proxy->execWait() and it will process all requests simultaneously and exit that method when everything is done... Then you can repeat.

Upvotes: 0

drew010
drew010

Reputation: 69977

Instead of

//first batch
$curlRequest = multiRequest($competeRequests);

$j = 0;
foreach ($curlRequest as $json){

You can do:

$curlRequest = array();

foreach (array_chunk($competeRequests, 1000) as $requests) {
    $results = multiRequest($requests);

    $curlRequest = array_merge($curlRequest, $results);
}

$j = 0;
foreach ($curlRequest as $json){
    $j++;
    // ...

This will split the large array into chunks of 1,000 and pass those 1,000 values to your multiRequest function which uses cURL to execute those requets.

Upvotes: 5

Related Questions