Yourguide
Yourguide

Reputation: 109

PHP CURL multi-threaded and single-threaded function help. How do I do this?

I found a function here: http://archevery.blogspot.com/2013/07/php-curl-multi-threading.html

I am using it to send an array of URLs to run and process as quickly as possible via Multi-threaded curl requests. This works great.

SOME of the urls I want to send it require they be processed in order, not at the same time, but in a sequence.

How can I achieve this?

Example:

URL-A URL-B URL-C --> All fire off at the same time

URL-D URL-E --> Must wait for URL-D to finish before URL-E is triggered.

My purpose is for a task management system that allows me to add PHP applications as "Tasks" in the database. I have a header/detail relationship with the tasks so a task with one header and one detail can be sent off multi-threaded, but a task with one header and multiple details must be sent off in the order of the detail tasks.

I can do this by calling curl requests in a loop, but I want them to also fire off the base request (the first task of a sequence) as part of the multi-threaded function. I dont want to have to wait for all sequential tasks to pile up and process in order. As in the first task of each sequence should be multi-threaded, but tasks with a sequence then need to wait for that task to complete before moving to the next.

I tried this function that I send the multiple tasks to, but it waits for each task to finish before moving on the next. I need to somehow combine the multi-threaded function from the URL above with this one. Here is my multithreaded curl function:

function runRequests($url_array, $thread_width = 10) {
    $threads = 0;
    $master = curl_multi_init();
    $curl_opts = array(CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_MAXREDIRS => 5,
        CURLOPT_CONNECTTIMEOUT => 15,
        CURLOPT_TIMEOUT => 15,
        CURLOPT_RETURNTRANSFER => TRUE);
    $results = array();
    $count = 0;
    foreach($url_array as $url) {
        $ch = curl_init();
        $curl_opts = [CURLOPT_URL => $url];
        curl_setopt_array($ch, $curl_opts);
        curl_multi_add_handle($master, $ch); //push URL for single rec send into curl stack
        $results[$count] = array("url" => $url, "handle" => $ch);
        $threads++;
        $count++;
        if($threads >= $thread_width) { //start running when stack is full to width
            while($threads >= $thread_width) {
                //usleep(100);
                while(($execrun = curl_multi_exec($master, $running)) === -1){}
                curl_multi_select($master);
                // a request was just completed - find out which one and remove it from stack
                while($done = curl_multi_info_read($master)) {
                    foreach($results as &$res) {
                        if($res['handle'] == $done['handle']) {
                            $res['result'] = curl_multi_getcontent($done['handle']);
                        }
                    }
                    curl_multi_remove_handle($master, $done['handle']);
                    curl_close($done['handle']);
                    $threads--;
                }
            }
        }
    }
    do { //finish sending remaining queue items when all have been added to curl
        //usleep(100);
        while(($execrun = curl_multi_exec($master, $running)) === -1){}
        curl_multi_select($master);
        while($done = curl_multi_info_read($master)) {
            foreach($results as &$res) {
                if($res['handle'] == $done['handle']) {
                    $res['result'] = curl_multi_getcontent($done['handle']);
                }
            }
            curl_multi_remove_handle($master, $done['handle']);
            curl_close($done['handle']);
            $threads--;
        }
    } while($running > 0);
    curl_multi_close($master);
    return $results;
}

and here is single-threaded curl function.

function runSingleRequests($url_array) {
foreach($url_array as $url) {   

// Initialize a CURL session. 
$ch = curl_init();  

// Page contents not needed. 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); 

// grab URL and pass it to the variable. 
curl_setopt($ch, CURLOPT_URL, $url); 

// process the request.  
$result = curl_exec($ch);

    }

Both take an array of URLs as their input.

I currently have an array of all single tasks and another array of all multiple tasks with a "header id" that lets me know what header task each detail task is part of.

Any help on theory or code would be most appreciated. Thanks!

Upvotes: 1

Views: 1115

Answers (2)

user1805543
user1805543

Reputation:

Here's an easier to follow example, From : http://arguments.callee.info/2010/02/21/multiple-curl-requests-with-php/

curl_multi_init. This family of functions allows you to combine cURL handles and execute them simultaneously.

EXAMPLE

build the individual requests, but do not execute them

$ch_1 = curl_init('http://webservice.one.com/');
$ch_2 = curl_init('http://webservice.two.com/');
curl_setopt($ch_1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch_2, CURLOPT_RETURNTRANSFER, true);

build the multi-curl handle, adding both $ch

$mh = curl_multi_init();
curl_multi_add_handle($mh, $ch_1);
curl_multi_add_handle($mh, $ch_2);

execute all queries simultaneously, and continue when all are complete

  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while ($running);

close the handles

curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);

all of our requests are done, we can now access the results

$response_1 = curl_multi_getcontent($ch_1);
$response_2 = curl_multi_getcontent($ch_2);
echo "$response_1 $response_2"; // output results

If both websites take one second to return, we literally cut our page load time in half by using the second example instead of the first!

Referances : https://www.php.net/manual/en/function.curl-multi-init.php

Upvotes: 1

Tschallacka
Tschallacka

Reputation: 28722

Why don't you use a rudementary task scheduler to schedule your requests and followups, instead of running everything at once?

See it in action: https://ideone.com/suTUBS

<?php
class Task 
{
    protected $follow_up = [];
    protected $task_callback;

    public function __construct($task_callback) 
    {
        $this->task_callback = $task_callback;
    }

    public function addFollowUp(Task $follow_up) 
    {
        $this->follow_up[] = $follow_up;
    }

    public function complete() 
    {
        foreach($this->follow_up as $runnable) {
            $runnable->run();
        }
    }

    public function run() 
    {
        $callback = $this->task_callback;

        $callback($this);
    }
}



$provided_task_scheduler_from_somewhere = function() 
{
    $tasks = [];

    $global_message_thing = 'failed';

    $second_global_message_thing = 'failed';

    $task1 = new Task(function (Task $runner) 
    {
        $something_in_closure = function() use ($runner) {
            echo "running task one\n";
            $runner->complete();
        };
        $something_in_closure();
    });

    /**
     * use $global_message_thing as reference so we can manipulate it
     * This will make sure that the follow up on this one knows the status of what happened here
     */
    $second_follow_up = new Task(function(Task $runner) use (&$global_message_thing)
    { 
        echo "second follow up on task one.\n";
        $global_message_thing = "success";
        $runner->complete();
    });

    /**
     * Just doing things in random order to show that order doesn't really matter with a task scheduler
     * just the follow ups
     */
    $tasks[] = $task1;

    $tasks[] = new Task(function(Task $runner) 
    {
        echo "running task 2\n";
        $runner->complete();
    });

    $task1->addFollowUp(new Task(function(Task $runner) 
    { 
        echo "follow up on task one.\n";
        $runner->complete();
    }));

    $task1->addFollowUp($second_follow_up);

    /**
     * Adding the references to our "status" trackers here to know what to print
     * One will still be on failed because we did nothing with it. this way we know it works properly
     * as a control.
     */
    $second_follow_up->addFollowUp(new Task(function(Task $runner) use (&$global_message_thing, &$second_global_message_thing) {
        if($global_message_thing === "success") {
            echo "follow up on the second follow up, three layers now, w00007!\n";
        }
        if($second_global_message_thing === "success") {
            echo "you don't see this\n";
        }
        $runner->complete();
    }));
    return $tasks;
};
/**
 * Normally you'd use some aggretating function to build up your tasks
 * list or a collection of classes. I simulated that here with this callback function.
 */
$tasks = $provided_task_scheduler_from_somewhere();

foreach($tasks as $task) {
    $task->run();
}

This way you can have nesting of tasks that need to follow after each other, with some clever uses of closures you can pass parameters to the executing functions and the encompassing objects outside it.

In my example the Task object itself is passed to the executing function so the executing function can call complete when it's done with it's job.
When complete is called the Task determine if it has scheduled follow up tasks to execute and if so, those are automatically called and works itself down the chain like that.

It's a rudimentary task scheduler, but it should help you on the way getting steps planned in the order you want them to be executed.

Upvotes: 1

Related Questions