Reputation: 2108
I have a group of urls that redirect, and I want to use curl_ multi to speed up the process to get the final url. But it seems that curl_multi haven't finished to follow redirect when I'm accessing CURLINFO_EFFECTIVE_URL because it return the original url.
function processUrls($urls){
$handlers = [];
$cleanUrls = [];
$mh = curl_multi_init();
foreach ($urls as $key => $url){
$handlers[$key] = curl_init();
curl_setopt($handlers[$key], CURLOPT_URL, $url);
curl_setopt($handlers[$key], CURLOPT_HEADER, true);
curl_setopt($handlers[$key], CURLOPT_FOLLOWLOCATION, true);
curl_setopt($handlers[$key], CURLOPT_RETURNTRANSFER, true);
curl_setopt($handlers[$key], CURLOPT_NOBODY, true);
curl_setopt($handlers[$key], CURLOPT_HEADER, true);
curl_multi_add_handle($mh, $handlers[$key]);
}
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
foreach($handlers as $key => $ch){
$cleanUrls [$key] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL) ;
}
curl_multi_close($mh);
return $cleanUrls;
}
I think the issue is this code:
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
Upvotes: 0
Views: 169
Reputation: 5661
There is a better more efficient and flexible approach using stream_socket_client()
. There is no delay for each url. A slow response from one url will not affect the other url response times. The responses will come back on a first come first serve basis. You could get the responses sequentially if necessary.
If I understand correctly you want to make some HTTP Requests where the response time can be unpredictable or too long to run consecutively and would like to run the request concurrently.
I do this when running W3C validation tools.
I do CSS validation,m HTML validation and XHTML validation. (I like my code to use as much as XHTML and only use HTML5 when necessary. Old W3C mobile best practice habit.)
Before I transmit the <body>
of the HTML, I start the concurrent requests using stream_socket_client()
.
This is about as close to multi-tasking as PHP gets. This is actual working code that I have been using for a few years.
$url is the fully qualified URL for the page under test e.g. http://example.com/index.html
$url = $_POST['url'];
$webPageTestKey = ' [key for WebPageTest.org goes here] ';
$timeout = 120;
$result = array();
$sockets = array();
$buffer_size = 8192;
$id = 0;
$urls = array();
$path = $url;
$url = urlencode("$url");
The request urls are stored in the $urls[] array
$urls[] = array('host' => "jigsaw.w3.org",'path' => "/css-validator/validator?uri=$url&profile=css3&usermedium=all&warning=no&lang=en&output=text");
$urls[] = array('host' => "validator.w3.org",'path' => "/check?uri=$url&charset=%28detect+automatically%29&doctype=Inline&group=0&output=json");
$urls[] = array('host' => "validator.w3.org",'path' => "/check?uri=$url&charset=%28detect+automatically%29&doctype=XHTML+Basic+1.1&group=0&output=json");
$urls[] = array('host' => "www.webpagetest.org",'path' => "/runtest.php?f=xml&bwDown=10000&bwUp=1500&latency=40&fvonly=1&k=$webPageTestKey&url=$url");
sockets need the host and path.
If you cannot easily see the format of the urls dump the array with
var_export($urls);
continued:
$err = '';
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
$start[] = microtime(true);
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
The Request Sockets are stored in array $sockets[]
Then I transmit HTML while waiting for the requests to complete.strong text
Responses come back in the order in which they are received. The order in which the requests were made does not matter.
Responses are read in via an 8K buffer. If the response is more than 8K multiple chunks are retrieved 8K at a time.
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
$closed[$id] = microtime(true); // not necessary
fclose($r);
unset($sockets[$id]);
// check $response[$id] for redirect here
}
else {
$result[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
The HTTP Responses are stored in the $result[] array.
You will have to add the search of the response for the redirect then make the subsequent request.
This code gives you full control. Nothing hidden, no unknowns.
If you want to give up some control for ease of use. Make the request to your own script and use regular curl to do the requests and redirects.
Upvotes: 1