Reputation: 507
I'm trying to take a rather large list of domains query the rank of each using the compete.com API as seen here -> https://www.compete.com/developer/documentation
The script I wrote takes a database of domains I populated and initiates a cURL request to compete for the rank of the website. I quickly realized that this was very slow because each request was being sent one at a time. I did some searching and came across this post-> http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/ which explains how to perform simultaneous HTTP requests in PHP with cURL.
Unfortunately that script will take an array of 25,000 domains and try to process them all at once. I found that batches of 1,000 work quite well.
Any idea how to send 1,000 queries to compete.com then wait for completion and send the next 1,000 until the array is empty? Here's what I'm workin with thus far:
<?php
//includes
include('includes/mysql.php');
include('includes/config.php');
//get domains
$result = mysql_query("SELECT * FROM $tableName");
while($row = mysql_fetch_array($result)) {
$competeRequests[] = "http://apps.compete.com/sites/" . $row['Domain'] . "/trended/rank/?apikey=xxx&start_date=201207&end_date=201208&jsonp=";
}
//first batch
$curlRequest = multiRequest($competeRequests);
$j = 0;
foreach ($curlRequest as $json){
$j++;
$json_output = json_decode($json, TRUE);
$rank = $json_output[data][trends][rank][0][value];
if($rank) {
//Create mysql query
$query = "Update $tableName SET Rank = '$rank' WHERE ID = '$j'";
//Execute the query
mysql_query($query);
echo $query . "<br/>";
}
}
function multiRequest($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
?>
Upvotes: 2
Views: 3191
Reputation: 1292
https://github.com/webdevelopers-eu/ShadowHostCloak
This does exactly what you want. Just pass empty argument to new Proxy()
to bypass proxy and make direct requests.
You can stuff 1000 requests in it and call $proxy->execWait()
and it will process all requests simultaneously and exit that method when everything is done... Then you can repeat.
Upvotes: 0
Reputation: 69977
Instead of
//first batch
$curlRequest = multiRequest($competeRequests);
$j = 0;
foreach ($curlRequest as $json){
You can do:
$curlRequest = array();
foreach (array_chunk($competeRequests, 1000) as $requests) {
$results = multiRequest($requests);
$curlRequest = array_merge($curlRequest, $results);
}
$j = 0;
foreach ($curlRequest as $json){
$j++;
// ...
This will split the large array into chunks of 1,000 and pass those 1,000 values to your multiRequest
function which uses cURL to execute those requets.
Upvotes: 5