Mithaldu
Mithaldu

Reputation: 2410

What is currently the most comfortable and reliable cross-platform Perl module to do parallel downloads?

I'm going to have to download a number of datasets via simply POSTing at an url and getting XML in return. I will be able to speed this up by doing more than one request at a time, but here's the hook:

It will need to run on both Windows and Linux, so threads and forks are both out. (Since this is purely IO-bound i don't think they're needed either.)

Additionally my coworkers aren't on a very high level of perl understanding, but need to be able to grasp how to use it (not necessarily what's going on, usage is fine). As such i'd be happy if its API was somewhat simple.

Right now i'm looking at IO::Lambda for this.

Any other suggestions?

Post-Mortem: Based on draegtun's suggestion i've now thrown together this, which does the job perfectly: https://gist.github.com/661386 You might see it on CPAN soonish.

Upvotes: 3

Views: 906

Answers (3)

Øyvind Skaar
Øyvind Skaar

Reputation: 2338

Mojo::UserAgent can also do async paralell http. Its API might be easier to understand for non-perl people than some of the other modules..

Not sure if it qualifies as "reliable" yet ..

Upvotes: 1

Sinan Ünür
Sinan Ünür

Reputation: 118166

You can try to use LWP::Parallel.

Update:

I just tried to build it on Windows XP with ActiveState's 5.10.1 and encountered a bunch of test failures some which are due to the TEST script blindly prepending .. to all entries in @INC and others seem to be due to a version mismatch with LWP::Protocol::* classes.

This is a concern. I might go with Parallel::ForkManager in conjunction with LWP.

#!/usr/bin/perl

use strict; use warnings;
use Config::Std { def_sep => '=' };
use File::Slurp;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use Parallel::ForkManager;

die "No config file specified\n" unless @ARGV;
my ($ini) = @ARGV;

read_config $ini, my %config;

my $pm = Parallel::ForkManager->new(10);

my @urls = @{ $config{''}{url} };

for my $url ( @urls ) {
    $pm->start and next;
    my $param = [ %{ $config{$url} } ];
    my $request = POST $url, $param;
    my $ua = LWP::UserAgent->new;
    my $fn = sprintf '%s-%s-%s.xml',
                     map $request->$_, qw( method uri content);
    $fn =~ s/\W+/_/g;
    my $response = $ua->request( $request );
    if ( $response->code == 200 ) {
        write_file $fn, \ $response->as_string;
    }
    else {
        warn $response->message, "\n";
    }
    $pm->finish;
}
$pm->wait_all_children;

Here is a sample config file:

url = http://one.example.com/search
url = http://two.example.com/query
url = http://three.example.com/question

[http://one.example.com/search]
keyword = Perl
limit = 20

[http://two.example.com/query]
type = Who is
limit = 10

[http://three.example.com/question]
use = Perl
result = profit

Update:

If you need to convince yourself that execution is not serial, try the following short script:

#!/usr/bin/perl

use strict; use warnings;

use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new(2);

for my $sub (1 .. 4) {
    $pm->start and next;
    for my $i ('a' .. 'd') {
        sleep rand 3;
        print "[$sub]: $i\n";
    }
    $pm->finish;
}

$pm->wait_all_children;

Output:

[1]: a
[1]: b
[2]: a
[1]: c
[1]: d
[2]: b
[3]: a
[3]: b
[3]: c
[2]: c
[3]: d
[2]: d
[4]: a
[4]: b
[4]: c
[4]: d

Regarding your comment about "reliability", I believe it's misguided. What you are doing is simulated by the following script:

#!/usr/bin/perl

use strict; use warnings;

use Parallel::ForkManager;
use YAML;

my @responses = parallel_run();

print Dump \@responses;

sub parallel_run {
    my $pm = Parallel::ForkManager->new(2);
    my @responses;
    for my $sub (1 .. 4) {
        $pm->start and next;
        for my $i ('a' .. 'd') {
            sleep rand 3;
            push @responses, "[$sub]: $i";
        }
        $pm->finish;
    }
    $pm->wait_all_children;
    return @responses;
}

The output you get from that will be:

--- []

It is up to you to figure out why. That's why Parallel::ForkManager allows you to register callbacks. Just like the ones you are using with AnyEvent::HTTP.

What module you use is your own business. Just don't keep making blatantly false statements.

Upvotes: 5

draegtun
draegtun

Reputation: 22570

Have a look at AnyEvent::HTTP. According to the CPAN testers platform matrix it does compile & work on Windows.

Below is a straightforward example of async POSTing (http_post).

use 5.012;
use warnings;
use AnyEvent::HTTP;

my $cv = AnyEvent->condvar;

my @urls = (
    [google => 'http://google.com', 'some body'],
    [yahoo  => 'http://yahoo.com' , 'any body' ],
);

for my $site (@urls) {
    my ($name, $url, $body) = @$site;
    $cv->begin; 
    http_post $url, $body => sub {
        my $xml = shift;
        do_something_with_this( $name, $xml );
        $cv->end;
    }
}

# wait till all finished
$cv->recv;
say "Finished";

sub do_something_with_this { say @_ }

NB. Remember whatever you decide todo with do_something_with_this try to avoid anything that blocks. See other non-blocking AnyEvent modules

/I3az/

Upvotes: 6

Related Questions