user1050544
user1050544

Reputation: 447

Website Performance with hundreds of Curl Requests

Setup:

Nginx as static resource/reverse proxy for PHP-FPM.

Our site has a search.product page that makes a curl request to get its information from an external source. This class gets called per page request. In production there are no echos,

Basic PHP looks like:

class myCurl
{
    private $ch = NULL;

    public function __construct()
    {
        $this->ch = curl_init();
    }

    public function __destruct()
    {
        if ($this->ch)
        {
            curl_close($this->ch);
            $this->ch = NULL;
        }            
    }

    private function _callCurl($url)
    {
        curl_setopt($this->ch, CURLOPT_URL, $url);
        curl_setopt($this->ch, CURLOPT_TIMEOUT, 15);
        curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, 1);
        $data = curl_exec($this->ch);    

        if ($data === FALSE)
        {
            echo "No Data \n";
        }

        if (curl_errno($this->ch) === 28)
        {
            echo "Timeout \n";
        }

        if ($array = json_decode($data, TRUE))
        {
            return $array;
        }

        return FALSE;
    }
}

Our search pages, product pages get hit by bots alot, google, yahoo, bing.

When they are hitting us we sometimes see an increase in 502, 503, 499 errors in nginx access logs.

I am looking at three options and need help on what to do

Options: 1) Better configuration cURL...not sure how to do this, need help. 2) Move to sockets using php fsockopen...i have heard this is faster/more efficient 3) Throw another server at our configration to lighten the load 4) Not sure...any suggestions.

These search results cannot be cached...they are very unique and are not really visited by regular users..They contain a large amount of get parameters. We tell the bots not to scrape them through the robots.txt, but they do not seem to care as we are seeing a lot of server errors in google webmaster tools and it is hurting our index status and rankings.

During normal traffic we do not see as many errors.

For example. lets say that while the search bots are hitting us we are making hundreds of curl requests per second and they are timing out..Cache cannot be used..What are the ways around this..Again i am thinking another server.

Please help

Upvotes: 0

Views: 792

Answers (1)

Fleshgrinder
Fleshgrinder

Reputation: 16253

I doubt that other PHP functions will be the solution to your problem. fsockopen() might be a little bit faster, but it's definitely not worth the effort if you already have it working with cURL. But you should benchmark this on your actual server yourself, here's a nice little micro benchmark. Put fsockopen() code in f1() body and curl() in f2().

<?php

// Make sure any errors are displayed.
ini_set("display_errors", true);

// You can use the following construct to create an A/B benchmark.
define('LOOP', 10000);

function f1() {
  for ($i = 0; $i < LOOP; ++$i) {

  }
}

function f2() {
  for ($i = 0; $i < LOOP; ++$i) {

  }
}

$time1 = -microtime(true);
f1();
$time1 += microtime(true);

$time2 = -microtime(true);
f2();
$time2 += microtime(true);

var_dump($time1, $time2);

DNS / hosts / IP

First of all directly access your endpoints via IP or at least enter the IPs in your hosts file for the domain. This will remove the DNS look ups or speed them up.

Code Optimization

Secondly lower the function calls you have, maybe something like:

<?php

class myCurl {

    private $ch = NULL;

    protected static $options = array(
        CURLOPT_TIMEOUT        => 15,
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_FOLLOWLOCATION => 1,
    );

    public function __construct() {
        $this->ch = curl_init();
        curl_setopt_array($ch, static::$options);
    }

    public function __destruct() {
        curl_close($this->ch);
    }

    private function _callCurl($url) {
        curl_setopt($this->ch, CURLOPT_URL, $url);
        $data = json_decode(curl_exec($this->ch), TRUE);
        if ($data === (array) $data) {
            return $data;
        }
        return FALSE;
    }
}

Of course this implementation assumes that your object is very ephemeral, which I don't know. If it isn't I'd go with an even simpler one shot method that frees its resources immediately.

<?php

final class MyCurl {

    protected static $options = array(
        CURLOPT_TIMEOUT        => 15,
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_FOLLOWLOCATION => 1,
    );

    protected function curlGET($url) {
        $ch = curl_init($url);
        curl_setopt_array($ch, static::$options);
        $data = json_decode(curl_exec($ch), true);
        curl_close($ch);
        if ($data === (array) $data) {
            return $data;
        }
        return false;
    }

}

Keeping the connection to the target server during the whole life of the object isn't a good idea if you can't reuse it. It's in fact much better to connect, get your data and directly close it again. Keeping the connections free for other processes and other clients.

OS Optimization

I'd also recommend optimization on OS level of stuff like this, the above tip with DNS/hosts is just a start. Optimize your TCP kernel configuration and speed things up a bit. You may find the following repository I made some (long) time ago helpful:

https://github.com/Fleshgrinder/sysctl.d/blob/master/sysctl.d.conf

Upvotes: 1

Related Questions