Reputation: 113
This is my first Catalyst app and I'm not sure how to solve the following problem.
The user enters some data in a form and selects a file (up to 100MB) for uploading. After submitting the form, the actual computation takes up to 5 minutes and the results are stored in a DB.
What I want to do is to run this process (and maybe also the file upload) in the background to avoid a server timeout. There should be some kind of feedback to the user (like a message "Job has been started" or a progress bar). The form should be blocked while the job is still running. A result page should be displayed once the job finished.
In hours of reading I stumbled upon concepts like asynchronous requests, job queues, daemons, Gearman, or Catalyst::Plugin::RunAfterRequest.
How would you do it? Thanks for helping a web dev novice!
PS: In my current local app the work is done in parallel with Parallel::ForkManager. For the real app, would it be advisable to use a cloud computing service like Amazon EC2? Or just find a hoster who offers multi-core servers?
Upvotes: 3
Views: 450
Reputation: 113
Somehow I couldn't get the idea of File::Queue. For non-blocking parallel execution, I ended up using a combination of TheSchwartz and Parallel::Prefork like it is implemented in the Foorum Catalyst App. Basically, there are 5 important elements. Maybe this summary will be helpful to others.
2) A client (DB handle) for the TheSchwartz DB
package MyApp::TheSchwartz::Client;
use TheSchwartz;
sub theschwartz {
my $theschwartz = TheSchwartz->new(
databases => [ {
dsn => 'dbi:mysql:theschwartz',
user => 'user',
pass => 'pass',
} ],
verbose => 1,
);
return $theschwartz;
}
3) A job worker (where the actual work is done)
package MyApp::TheSchwartz::Worker::Test;
use base qw( TheSchwartz::Moosified::Worker );
use MyApp::Model::DB; # Catalyst DB connect_info
use MyApp::Schema; # Catalyst DB schema
sub work {
my $class = shift;
my $job = shift;
my ($args) = $job->arg;
my ($arg1, $arg2) = @$args;
# re-use Catalyst DB schema
my $connect_info = MyApp::Model::DB->config->{connect_info};
my $schema = MyApp::Schema->connect($connect_info);
# do the heavy lifting
$job->completed();
}
4) A worker process TheSchwartzWorker.pl
that monitors the table job non-stop
use MyApp::TheSchwartz::Client qw/theschwartz/; # db connection
use MyApp::TheSchwartz::Worker::Test;
use Parallel::Prefork;
my $client = theschwartz();
my $pm = Parallel::Prefork->new({
max_workers => 16,
trap_signals => {
TERM => 'TERM',
HUP => 'TERM',
USR1 => undef,
}
});
while ($pm->signal_received ne 'TERM') {
$pm->start and next;
$client->can_do('MyApp::TheSchwartz::Worker::Test');
my $delay = 10; # When no job is available, the working process will sleep for $delay seconds
$client->work( $delay );
$pm->finish;
}
$pm->wait_all_children();
5) In the Catalyst controller: insert a new job into the table job and pass some arguments
use MyApp::TheSchwartz::Client qw/theschwartz/;
sub start : Chained('base') PathPart('start') Args(0) {
my ($self, $c ) = @_;
$client = theschwartz();
$client->insert(‘MyApp::TheSchwartz::Worker::Test’, [ $arg1, $arg2 ]);
$c->response->redirect(
$c->uri_for(
$self->action_for('archive'),
{mid => $c->set_status_msg("Run '$name' started")}
)
);
}
The new run is greyed out on the "archive" page until all results are available in the database.
Upvotes: 2
Reputation: 5779
Put the job in a queue and do it in a different process, outside of the Web application. While you Catalyst process is busy, even if using Catalyst::Plugin::RunAfterRequest, it cannot be used to process other web requests.
There are very simple queuing systems, like File::Queue. Basically, you assign a job ID to the document, put it in the queue. Another process checks the queue and picks up new jobs.
You can save the job status in a database, or anything accessible any the web applications. On the front end, you can poll the job status every X seconds or minutes to give feedback to the user.
You have to figure out how much memory and CPU you need. Multi-core CPU or multiple CPUs may not be required, even if you have several processes running. Choosing between a dedicated server or cloud like EC2 is more about the flexibility (resizing, snapshot, etc.) vs. price.
Upvotes: 1