Reputation: 39386
I'm using a simple PHP library to add documents to a SOLR index, via HTTP.
There are 3 servers involved, currently:
At 80 documents/sec (out of 1 million docs), I'm noticing an unusually high interrupt rate on the network interfaces on the PHP and solr boxes (2000/sec; what's more, the graphs are nearly identical -- when the interrupt rate on the PHP box spikes, it also spikes on the Solr box), but much less so on the database box (300/sec). I imagine this is simply because I open and reuse a single connection to the database server, but every single Solr request is currently opening a new HTTP connection via cURL, thanks to the way the Solr client library is written.
So, my question is:
Upvotes: 65
Views: 85450
Reputation: 92792
cURL PHP documentation (curl_setopt) says:
CURLOPT_FORBID_REUSE
-TRUE
to force the connection to explicitly close when it has finished processing, and not be pooled for reuse.
So:
Upvotes: 61
Reputation: 257
Curl sends the keep-alive header by default, but:
curl_init()
without any parameters.CURLOPT_URL
option to pass the url to the contextcurl_exec()
curl_close()
very basic example:
function get($url) {
global $context;
curl_setopt($context, CURLOPT_URL, $url);
return curl_exec($context);
}
$context = curl_init();
//multiple calls to get() here
curl_close($context);
Upvotes: 24
Reputation: 3819
On the server you are accessing keep-alive must be enabled and maximum keep-alive requests should be reasonable. In the case of Apache, refer to the apache docs.
You have to be re-using the same cURL context.
When configuring the cURL context, enable keep-alive with timeout in the header:
curl_setopt($curlHandle, CURLOPT_HTTPHEADER, array(
'Connection: Keep-Alive',
'Keep-Alive: 300'
));
Upvotes: 16
Reputation: 23722
If you don't care about the response from the request, you can do them asynchronously, but you run the risk of overloading your SOLR index. I doubt it though, SOLR is pretty damn quick.
Upvotes: 1