Reputation: 416
Is there any option to limit parallel running threads. In example I have following code:
use threads;
use LWP::UserAgent qw( );
my $ua = LWP::UserAgent->new();
my @threads;
# if @threads < 200
for my $url (@URL_LIST) {
push @threads, async { $ua->get($url) };
}
# if @threads <= 200
for my $thread (@threads) {
my $response = $thread->join;
...
}
I'm trying to create script to proceed only 200 parallel requests if @URL_LIST including more than 10000 urls! But unfortunately the script get an info at the end that more then 20 threads unfinished. Any ideas what the solution should be?
Upvotes: 0
Views: 676
Reputation: 385789
You previously asked this question in a comment to question about collecting the responses in the same order as the requests were placed, and the code you posted was copied from an answer to that question. As such, I presume this is what you want as well.
What follows isn't the most efficient solution since there's no thread reuse, but it makes it easy to collect the responses in the order you desire.
use threads;
use LWP::UserAgent qw( );
my @urls = ...;
my $ua = LWP::UserAgent->new();
my @threads;
for (1..200) {
last if !@urls;
my $url = shift(@urls);
push @threads, async { $ua->get($url) };
}
while (@threads) {
my $thread = shift(@threads);
my $response = $thread->join;
if (@urls) {
my $url = shift(@urls);
push @threads, async { $ua->get($url) };
}
...
}
By using the worker model, you can reuse threads to avoid the time it takes to start them up. This also collects the responses in the order you desire.
use threads;
use Thread::Queue 3.01 qw( );
my $request_q = Thread::Queue->new();
my $response_q = Thread::Queue->new();
my @threads;
push @threads, async {
my $ua = LWP::UserAgent->new();
while (my $url = $request_q->dequeue()) {
$response_q->enqueue([ $url, $ua->get($url) ]);
}
};
$request_q->enqueue($_) for @urls;
$request_q->end();
my %responses;
for my $url (@urls) {
while (!$responses{$url}) {
my ($response_url, $response) = @{ $response_q->dequeue() };
$responses{$response_url} = $response;
}
my $response = delete($responses{$url});
...
}
$_->join for @threads;
Upvotes: 2
Reputation: 4104
Instead of spawning a thread to handle each single URL, perhaps you should spawn a constant number of worker threads that pull URLs from a Thread::Queue object and dump results into another such queue. When the URL queue empties, the worker threads can then end themselves, and you're left processing the results queue...
Upvotes: 6