Farzher
Farzher

Reputation: 14563

PHP - is using multi_curl for parallel processing a good idea?

I needed parallel processing in PHP, but PHP doesn't support it without installing extensions, so I'm using multi_curl to achieve this.

main.php - Builds an array of urls, which are all process.php with different $_GET parameters. Then executes them all using multi_curl.

process.php - The processing logic for each thread.

I'd just like to know if this is a viable way of doing things. Is this retarded? Does it cause a lot of overhead? Is there a more sensible way of doing this? Thanks.

Upvotes: 2

Views: 754

Answers (4)

FrancescoMM
FrancescoMM

Reputation: 2960

If you are running PHP on a webserver (and multi_curl may be unavailable), one way (without libraries) to make it run scripts in parallel is to open sockets to localhost:80 and manually make the webserver run the scripts you want. They will run in parallel using the server multithreading. Then in a loop you collect all the results, and when all of them are done (or after a timeout of your choice) you go on.

This is a piece of code taken from a script that retrieves sizes of all images referenced on a webpage..

The get_img_size.php script retrieves the size and info of one image.

$sockets[] is an array that keeps one socket for every image to test.

        foreach($metaItems['items'] as $uCnt=>$uVal) {
            $metaItem=ContentLoader::splitOneNew($metaItems,$uCnt);
            $AnImage=$metaItem['url'];

            $sockets[$AnImage] = fsockopen($_SERVER['HTTP_HOST'], 80, $errno, $errstr, 30);
            if(!$sockets[$AnImage]) {
                echo "$errstr ($errno)<br />\n";
            } else {
                $pathToRetriever=dirname($_SERVER['PHP_SELF']).'/tools/get_img_size.php?url='.rawurlencode($AnImage);
                // echo('<div>META Retrieving '.$pathToRetriever.' on server '.$_SERVER['HTTP_HOST'].'</div>');
                $out = "GET $pathToRetriever HTTP/1.1\r\n";
                $out .= "Host: ".$_SERVER['HTTP_HOST']."\r\n";
                $out .= "Connection: Close\r\n\r\n";
                // echo($out);
                fwrite($sockets[$AnImage], $out);
                fflush($sockets[$AnImage]);
                // echo("<div>Socket open for $AnImage...</div>");
                // flush();
            }
        }
    }  else $FoundImagePaths2[]=$metaItems; // ALL of them urls belongs to us

After this you can do your own business while the "threads" go on working, then, in a loop, you go on reading from all $sockets[] and testing for EOF. In the example, much later in the code (a loop for each $AnImage):

            if(isset($sockets[$AnImage])) {
                if(feof($sockets[$AnImage])) {
                    if(!isset($sizes[$AnImage])) $sizes[$AnImage]='';
                    $sizes[$AnImage].=fgets($sockets[$AnImage], 4096);

                    // echo("<div>HTML $AnImage DONE.</div>");
                    // echo("<div>[ ".$sizes[$AnImage]." ]</div>");
                    // flush();
                    fclose($sockets[$AnImage]);
                    unset($sockets[$AnImage]);

                    $mysizes=ContentLoader::cleanResponse($sizes[$AnImage]);

                    // echo($sizes[$AnImage]." ");
                    // echo(ContentLoader::cleanResponse($sizes[$AnImage]));

                    if(!is_array($mysizes)) {continue;}

                    if($mysizes[0]>64 && $mysizes[1]>64 && ($mysizes[0]>128 || $mysizes[1]>128))
                        $FoundImagePaths2[]=array('kind'=>'image','url'=>$AnImage,'ext'=>$ext,'width'=>$mysizes[0],'height'=>$mysizes[1],'mime'=>$mysizes['mime']);

It is not efficient in terms of memory and processes and speed-wise, but if a single image takes a few seconds, the whole page with 20+ images takes the same few seconds to test them all. It's somehow parallel PHP, after all.

Upvotes: 0

Joe Watkins
Joe Watkins

Reputation: 17148

https://github.com/krakjoe/pthreads

Threading for PHP ...

Enjoy ...

To install in unix, you'll need a Thread Safe version of PHP. Most distros do not package this version so you'll have to build it yourself.

A quick description of how to do so would be:

cd /usr/src
wget  http://php.net/get/php-5.3.17.tar.bz2/from/us.php.net/mirror
tar -xf php-5.3.17.tar.bz2
cd php-5.3.17/ext
wget https://github.com/krakjoe/pthreads/tarball/master -O pthreads.tar.gz
tar -xf pthreads.tar.gz
mv krakjoe-pthreads* pthreads
cd ../
./buildconf --force
./configure --enable-maintainer-zts --enable-pthreads --prefix=/usr
make
make install

I'd start with that, to build an isolated copy --prefix with a private location, like --prefix=/home/mydir, or some distros have a /usr/src/debug which is a good place for that sort of thing. You'll obviously want to add --with-mysql and the like, but how you do that depends on your system ( hint, you can use php -i | grep configure > factory.config to save your current php installations configure line and base your custom build of that knowing that any libaries it complains aren't available are an apt-get|yum install away ).

Upvotes: 1

ddinchev
ddinchev

Reputation: 34673

Having in mind that PHP does not support multi-processing by any reasonable means, multi_curl seems to be a good solution in your case!

Upvotes: 0

Jon
Jon

Reputation: 437336

Of course it's a viable way of doing things in general, that's why the functionality exists.

As always, the devil is in the details. Multiple concurrent requests will contest with other processes for and consume server resources; you will want to regulate the degree of concurrency.

Upvotes: 0

Related Questions