Reputation: 1862
what is the fastest way to get the http status code. I have a list within about 10k URL's to check. And in best case it checks them every 15 minutes. So i've a php script what uses simple curl functions and loop through them all. But it takes way too much time. Any suggestions what i can do to improve that? What about parallel checks on multiple urls? how many could php manage? I'm very new to this whole performance thing.
This is what i have:
public function getHttpStatus(array $list) {
$list = array(…); // Array contains 10k+ urls from database.
for($i = 0; $i < count($list); $i++) {
$ch = $list[$i];
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
$c = curl_exec($ch);
$info = curl_getinfo($ch);
echo $info['http_code'] . '<br />';
}
}
Thanks in advance!
Upvotes: 1
Views: 5710
Reputation: 71384
You might consider using curl_multi_exec()
- http://php.net/manual/en/function.curl-multi-exec.php, which allows you to process multiple curl handles in parallel. If you like, you can take a look at using a very lightweight REST client I wrote which supports curl_multi_exec()
. The link is here:
https://github.com/mikecbrant/php-rest-client
Now, I didn't set up this library to work with HEAD requests, which would actually be much more efficient than GET requests if you are only looking for response codes. But this should be relatively easy to modify to support such a use case.
At the very least this REST client library can give you good sample code with regards to how to work with curl_multi_exec()
Obviously, you would need to play around with the number of concurrent requests that you should use based on what your available hardware and the services you are making requests against can handle.
Upvotes: 3